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concept is used to develop a technigue called the 
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CHAPTER 1. 
INTRODUCTION AND PLAN OF THESIS 

1.0 Introduction 

The primary goal of this thesis is to provide insight 
into and shed additional light on several key problems in 
the design and analysis of general storage hierarchy 
systaas, 

1. 1 Significance of Problem 

The importance or research in storage hierarchy systems 
has baen pointed out by Prof. F. J. Corbat6 recently in the 
dll Project MAC Prograss Report VIII (July 1971) : 

"By now, it has become accepted lcre in the computer 
system field thit use of automatic management 
algorithms for memory systems, constructed or 
ssveral levels with different access times, can 
provide a significant simplification of programming 
effort, ... Unfortunately, behind the mask of 
acceptance hides a worrisome lack of knowledge 
bahind how to engineer a multilevel memory system 
with appropriate algorithms which are matched to the 
load and hardware characteristics." 



On multiple leva! storage hierarchies, Prot. J. H. 
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Saltzar wis even more explicit (subject notes on 
"Information Systems", tiir, 1972, p. 4-58): 



n kn interesting problem arises if one has three or 
more technologies to deal with. ... The problem of 
predicting the performance of a three level, 
automatically managed system is not at all well 
understood, ... Although the need for more than one 
lavel has already been argued, there is currently no 
kaown criterion for introducing three, four, or a 
levels for a given system. ... Although there are by 
asK many implemantations of two level memory 
systems, the dynaaic management of a three or more 
lavel memory system is such an uncharted area that 
there do not yat exist examples of practical 
algorithms which one can examine." 



1.2 S_2ecific Goals and Accomplis hme nts 

The specific goals and accomplishments of this thesis, 
wairn are further elaoorated later, are: 

• Analyze the affect of certain parameters, such as 

page size, upon the performance of a storage 
system . 

• Develop a concept of locality based upon both 

spatial and temporal adjacency in address 
reference patterns that explains certain anomalies 
discovered iti actual paging systems. 

• Propose, formalize, and measure the performance of 

new "spatial-removal" storage management 
algorithms, m particular "tuple-coupling", 

• Design a practical algorithm for effective 
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management of multiple level storage hierarchy 
systems an! demonstrate its effectiveness under 
some simulated system loads. 

1 . 3 General St r uct ur e of T hes is 

The key plan of this thesis is to investigate several 
crucial problems and requirements of multiple level storage 
hierarchy systems. Particular areas are identified and 
corresponding theories developed and proven. A new and 
general design for storage hierarchy systems is also 
presented and evaluated. Finally, empirical measurements are 
presented to validate and calibrate the overall design and 
specific theoretical conjectures. 

This thesis is organizationally divided into 8 
chaptars. The structure can be best introduced by outlining 
the content of the following chapters in the sections below. 

1.3.1 Chapter 2: Motivation for Storage Hierarchy Systems 

This chapter presents a perspective on the storage 
hierarchy problem and tne motivation for such systems. It 
is primarily written tor the benefit of people knowledgeable 
in the general computer field but who are not especially 
experienced in storage hierarchy systems. For the expert 
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caalec, this chapter exposes the biases and orientations of 
the author and thus sets the tone for the remainder of the 
thesxs. This chapter also briefly reviews the history of 
research in storage systems and cites numerous references. 

1.3.2 Chapter 3: Formalization of Storage Hierarchy Systems 

k description and formalization of the basic 
characteristics of storage hierarchy systems is presented in 
tais chapter. This is followed by a summary and critical 
analysis of research that directly relates to the specific 
jaals of this thesis. 

1.3.3 Chapter 4: A Storage Hierarchy System 

la this chapter the key concepts of the proposed 
storage hierarchy system are presented and discussed. The 
principle and novel techniques are briefly described below: 

1.3.3.1 Continuous Hierarchy 

The ratio of performance between adjacent levels is 
kept moderate (e.g., a factor of 100 or less) to minimize 
discontinuities or awltward special-case algorithms. This is 
in caatrast to many current systems with inter-level ratios 
of 1 030 or more. 
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1.3.3.2 Shadow Storage and Page Splitting 

Information is transferred in decreasing smaller size 
blocks as it is passed up from low performance levels of the 
hierarchy toward the "request generator" at the uppermost 
level. Thus, the information that is finally received by the 
raguest generator has left a "shadow" behind in the lower 
levels. The significance and rationale for this technique is 
further elaborated in Chapter 6. 

1.3.3.3 Automatic Management 

In order to reduce the load on the central processor 
aad provide for more efficient and parallel operations, the 
storage management function will be distributed and 
incorporated into the storage levels (e.g., "intelligent" 
davice controllers [1], etc.). This technique also reduces 
the complexity of the operating system software. 

1.3.3.4 Direct Transfer 

Storage transfers between two adjacent levels need not 
tiive any effect upon nor require the assistance of any ether 
lavels (e.g., there is no need to move information from 
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lavel a to level 1 and then frcm level 1 to level n-1 if 
only level n to level n-1 was needed; this two step process 
is often required on contemporary systems). Direct transfer 
is accomplished by synchronizing non-mechanical storage 
iavicas or by using "rubber-band" buffers [33] between 
aleotco-mechanical storage devices. 

1.3.3.5 Read Through 

Storage transfers, as noted above, are only made 
batwean adjacent levels of the hierarchy, such as from level 
n to level n-1. But, each level, such as level n-1, can 
cannect its input bus (from lower level n) to its output bus 
(to higher level n-2) so that the data can be read through 
(L.3., transferred to level n-2 while being stored in level 
n-1>. A similar, though specialized, technique is already 
use! in certain systams, such as the IBM System/370 Models 
155 and 165 cache systams [52], 

This results in performance similar to a direct 

jDnaection from each level to the request generator but it 

provides much more control in the storage levels and a much 
sioplar structure. 



Storage Hierarchy Systems 15 

1.3.3.6 Store Behind 

By using tha axcess capacity of the inter-level 
cianaals, there is a continual 1 low of altered data from tne 
higher levals to the lowest level permanent storage. Thus, 
tie actual updated information is stored behind (after) the 
store initiation froa the request generator. The updated 
iiforaation is propagated down, level by level. Wheuever 
iaforaation is altered at a particular level, it is tagged 
as altered and is scheduled for a "store behind" operation. 

1.3.4 Chapter 5: Anal/sis of Page Size Considerations 

3ne of the most important parameters of a storage 
hierarchy system is the page size chosen as the unit of 
transfer between two levels of the hierarchy. In this 
caaptar, the factors influencing page size are examined from 
the device characteristics viewpoint and the program 
Dahavior viewpoint, 

3f particular concern, it has been noticed by Hatfield 
[47] and Seligman [78] and formalized in Chapter 5 that: 



"There exists a page trace, P, and demand-fetch 

FIFO-removal or LEU-reaoval inter-level storage 

systems, S and S', with page sizes N and N'=N/2, 

raspectively, such that the ratio, r, of fetch 
frequency f 1 to f exceeds 2. H 
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This result runs counter tc the hoped for behavior of 
decreased page sizes as noted by Denning [25]: 



" ... small pages permit a great deal of compression 
without loss of efficiency. Small page sizes will 
yield significant improvements in storage 
utilization ... " 



la this chapter the significance of this problem is 
iamonstrated by proving that even "well-behaved" removal 
algorithms, such as stack algorithms £63], are not immune to 
this adverse performance behavior. Furthermore, the nature 
ot this phenomenon is analyzed and bounds on its behavior 
are developed. 

1.3,5 Chapter 6: Spatial vs. Temporal Locality Model of 

Program Behavior 

k primary rationale for hierarchical storage systems is 
based upon the "Principle of Locality". Unfortunately, this 
principle is still a poorly understood, or at least 
controversial, phenomanon. It is difficult to determine the 
original "discoverer" of this principle but it is 
iatsrasting to note that its definition has changed in time. 
For example. Denning [29, p. 3], in 1968 loosely described 
locality as: 
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"the idea that a computation will, during an 
interval of time, favor a subset of the information 
available to it." 



Liter, in 1970, Denning [26, p. 180] defined it more 
precisely based upon the concepts of "working set" and 
"reference density", which for a page i at time k: 

a (i,k) = Pr[ref erence r(k)=i], 
sjca that 3(k) is tae ranking of all n pages based upon 
a (i,k) ; thus: 



"PHINCIPLE OF LOCALITY: The rankings H (k) are 
strict and the expected ranking lifetimes long." 



lais is a much more restrictive definition of locality than 
ais earlier general concept. 

In fact, many current storage management systems were 
devised first, a general model was then constructed to 
iascribe the system, and finally a "formal" definition of 
locality was developed to be consistent with the storage 
management model. This is a reasonable strategy as long as 
tae underlying concepts of "the principle of locality" are 
not lost in the prosass. Unfortunately, this appears to 
aavs happened on several occasions. In particular, most 
popilar definitions of locality tend to be useless for 
analyzing or explaining either the relationship of page size 
upon program behavior or the impact of generalizing from 
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trfo-lsvel storage systems to nultiple level hierarchical 
storage systems. 

In this chapter a new view c£ locality is presented (or 
an old-view resurrected since it most closely resembles some 
o£ tha very early descriptions of locality) . In particular, 
it is shown that tha general concept of locality can be 
subdivided into two separate factors, te mpora l locality and 
§H§tial locality. Thase concepts are defined and justified 
and then used to explain some peculiar phenomena 
("anomalies") observed in actual two-level storage systems. 

By means of address traces and storage system 
.simplifications, the temporal and spatial locality behavior 
ot actual programs is emperically measured. These results 
are used to reinforce and calibrate the storage hierarchy 
systei design presented in Chapter 4. 

1.3.6 Chapter 7: Spatial Removal Storage Management 
algorithms 

Various Hierarchy storage management algorithms, such 
-as fetch (e.g., demand- fetch) and temporal removal (e.g., 
iirst-in first-out (FIFO), least recently used (LRO) , etc.) 
have been developed, primarily for two-level hierarchies. 
Ihera appear to be no spatial removal algorithms described 
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i.i the literature. Based upon Chapter 6, several spatial 
algorithms are proposed and analyzed. 

It is also shown that some cf the problems aescribed in 
Chapter 5 can be solved by spatial removal algorithms. In 
particular, Hatfield ^48] noted that: 

"is yet we have been unable to prove that there is a 

replacement algorithm using only the past history or 

page requests which cannot generate more than twice 
the exceptions with half size pages." 

In this chapter a new algorithm, named tu£l e^co u^lijag , is 
presented. it is formally proven that it satisfies 
datfiald's requirements above. 

Furthermore, the operational behavior of tuple-coupling 
is analyzed by measuring the performance of actual programs. 

1.3.7 Chapter 8: Discission and Conclusions 

In addition to a general summary of the significant 
aspects of the thesis, this chapter also outlines important 
areas for future research. 
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CHAPTER 2. 
THE STDSAGE HIEfiARCHY PHOBLEH 

2.0 Introduction 

The evolution of computer systems has been marked by a 
continually increasing demand for taster, larger, and more 
econoaical storage facilities. In addition to the obvious 
concern for better performance, the organization of a 
computer system's storage plays a key role in program 
development and programmer efficiency. It has often been 
claimed that "any software design blunder can be overcome oy 
adding more memory". 

It has become geaerally recognized that the conflicting 
requirements of hijh-perf oriance yet low-cost storage may be 
bast satisfied by a mixture of technologies combing 
expensive nigh-performance devices with inexpensive 
lower- performance devices. This strategy has been given 
several names, such as "hierarchical storage system", 
"automatic multilevel storage management 11 , "virtual memory", 
and ttie inevitable "virtual memory system tor the automatic 
multilevel management of a hierarchy of storage devices". 
la this thesis the somewhat shorter term storage hierarchy 
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System will be used. 

Investigations into automated storage hierarchy 
techniques can be traced bacit nore than a decade. It we 
wara to include manual techniques, we would find storage 
hierarchies at the vary dawn oi the "computer age". 
[Jnfort jnately, there are still many unsolved and poorly 
understood problems. This situation can be partly explained 
by tha fact that thase systems tend to be (1) extremely 
complax, (2) ill-suited to most conventional analytical 
tacai-jues, and (3) deeply influenced by the rapidly evolving 
computer technology which keeps "changing the ground rules" 
at often frightening rates. In spite of these challenging 
stumbling blocks, a successful storage hierarchy system is 
so important to the future usefulness of computer systems 
that we cannot afford to abandon the search. 

2. 1 Storage Hierarchy objectives 

Before delving into details, it is worthwhile to 
briefly consider the needs and uses fcr an effective storage 
hierarchy. 
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2.1.1 System Performance and Economics 

As logic tecanology and computer architecture 
tacaniques have advaaced, we have found it possible to 
introduce systems of incredible speed. Such systeas are often 
rated, rather crudely, in terms of MIPS (millions of 
instructions per second). Experimental system of over 100 
MIPS aave been developed (e.g., ILLIAC IV and CDC STAR). 
Sven "conventional" large-scale systeas have passed the 5 or 
13 MIPS aaric (e.g., CDC 7600 and IBM 370/195). It has long 
baen observed that the input/output (I/O) requirements, 
especially for "secondary storage", of a conventional system 
taul to ba strongly related to the processor's speed. In 
fact, based upon several empirical measurements, it has been 
postulated that a computer system averages 1 bit of I/O for 
a/ery instruction exacuted (this is often referred to as 
Aidaal's Constant [.ref]). As a result, many of these 
nigh-performance systaas have been confronted with massive 
bottleneck problems in the I/O area, especially since these 
I/O demands tend to occur in bursts. An effective storage 
hierarchy system coull go a long way toward reducing this 
problem. 

At the other end of the spectrua we find that mediua- 
aad low-cost processors, the latter are usually called 
!2ifii;:22a£iiters, have aade suostantial advances in recent 
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ysars. The term "mini" can be quite misleading. Tnese 
processors are typically hundreds of times taster than the 
early commercial compaters at a fraction of the cost (e.g., 
ti»e UNIVAC I, circa 1951, could perform about 2000 12-digit 
additions per seconl whereas contemporary mini-computers 
operate at around 1,330,000 5-digit additions per second). 
Althoagh these mini-processors may be midgets compared to 
the computational problems attacked by their "big brothers" 
described above, they are more than adequate for the vast 
majority of infomatioa processing problems which have modest 
computational requirements. Due to technological advances 
and economies of scale resulting from large-scale 
production, some minicomputers are available for less than 
$2000 with slightly slower micro-computers being offered for 
as little as $66 [ 13 ]• Iu spite of these technological 
advances, these processors have not had much impact en most 
infociation system needs due to the continuing economic 
problem of producing large capacity inexpensive storage 
aevices even at the modest performance required. A $b6 
processor is largely irrelevant if the storage costs are in 
the $100,000 or mora range. By developing an effective 
storage hierarchy system, we can go a long way toward 
bciuging the storage costs down to the level oi these 
inexpensive processors. As a result, a tremendous number of 
currently known technical solutions to information 
processing problems will finally become economically 
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feasible solutions. 

2.1.2 Simplify aud Automata Programming 

is noted earlier, the organization of a computer's 
storage system has a considerable impact upon program 
develjpment and programmer efficiency. To a large extent, 
ttiis potential increase in productivity is obtained by 
rsducing or eliminating constraints normally imposed by the 
storage system; These constraints often distract the 
programmer to the extent that he devotes a substantial 
mount of his time to overcoming the system's lxmitations 
rathec than solving the intrinsic problems. Shoomac [80] 
noted that: 



The inherent ercor content o£ some programs is 
claimed to be related to the excess memory capacity 
available. The theory here is that if the memory is 
very cramped, the software writers will have to 
resort to overlays and other coding "tricks" to 
sgueeze the desired functions into the allocated 
memory space. It is assuaed that these tricks 
introduce great complexity and are the seat of many 
ecrors. This effect is cited by designers of 
airborne computers where the allocation of another 
block of 4k of memory is a major design decision." 



for example, the programmer often has to worry about: 
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2.1.2.1 Programming language code efficiency. 

If a higher-levai language compiler tends to produce 
programs that are at all larger than those produced by a 
low-lavel language translator, it may be necessary to use 
the low- level ianguaga to conserve storage. This constraint 
is contrary to the generally accepted fact that high-level 
languages enhance programming productivity. 

2.1.2.2 Program size. 

For any specific storaye size, there are programs that 
cannot be easily written to fit into that size constraint. 
¥at, programmers frequently try - with considerable effort. 

2.1.2.3 Data structures. 

The programmer is often faced with the need to choose 
batnaan a data structure reprasenta tion that is convenient 
to use and another representation that "saves storage". 
Pais saving may raguire the use of an awxward or 
unnecassarily complex data structure representation. 



Storage Hierarchy Systems 26 

2.1.2.4 Specific equipment characteristics. 

If the programmer must get the "lost" out of his 
storage system in teras of capacity and performance, he may 
resort to techniques that are peculiar to his specific 
storage system equipment. If the equipment is changed, 
there may be a considerable impact upon his software. 

rfe would like to develop storage hierarchy techniques 
that eliminate, automate, or at least minimize the 
programming problems iescribed above. 

2.1.3 Integrate New Technologies and Applications 

although there has been continual evolution, the basic 
storage device technologies in commercial use have not 
changed dramatically in the past decade. As a result, taere 
has b*en a tendency, motivated by actual need, to relate 
applications to the specific available technologies. This 
has caused certain application areas to be abandoned as 
"inf easible" and many storage oanagement strategies to be 
iiscradited as "irrelevant" or "inefficient". In the passage 
of tiie we remember the applications and tecnniques in use 
but frequently forget or ignore the alternatives possible 
and tne reasons for bypassing these alternatives. 
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After this rather long "rest", it appears that we are 
on the verge of some major "awakenings" in applications and 
technology. It is hard to quantify the new application 
needs other than requiring more and faster storaye for less 
monay. Section 1.1.2 presents sense of these motivations, the 
revitalized interests in time-sharing, artificial 
intelligence and automatic programming are also "fanning the 
f ira". 

Due to the uncertainty of advanced research in storage 
device technologies, it is difficult to lorsee accurately 
which of the many actxve efforts wxll succeed (see for 
example, Ayling [7], Bast [15], Bobeck [16], Camras ^ 17 J, 
Dall ;2U], Fields [35], Gardner [39], Howard £50], MaticK 
[SHatick.], Myers [59], Rector [74], Shahbender £79], 
rtioapson [85]). Considering the technical advances clearly 
demonstrated in the laboratory and the driving "profit" 
motivation, it is reasonable to expect some dramatic changes 
in tna next few years. Even if we don't know what or when, 
wa woald be foolhearty to totally ignore this situation. 

Table 1 below indicates the performance and price 
characteristics of typical current-day storage technologies. 
The two entries marked by question marks (?) , Bulk Store and 
Giant Store, indicate new technologies that have already 
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oaea placed in Limited use. Since these two 
cost/perf ortnance positions were net part of our 
"traditional" technologies, we are faced with the problem of 
possible modifying our applications and developing new 
strategies to efficiently, effectively, and, hopefully 
optimally, integrate them into our overall hierarchical 
storage system. 

As the entire spectrum of computer architectures, as 
well is storage device technologies, undergoes reshuf flings, 
both avolution as well as revolutions, it is worthwhile to 
raview and reconsider our current concepts on storage system 
lasign. Taole 1, although a simplified summary of current 
storage technologies, illustrates the fact that there exists 
a spectrum of devices that span about 6 orders of magnitude 
of price/performance ( 100, 000,00 OX) . This is guite 
sigaificant in the light of the excitement that normally 
accompanies an improvement of 10-20% in performance or a 
decrease of 10-20* ia price in current-clay systems. The 
participants in this "storage sweepstakes" may change in 
time, but with such large price/performance stakes, there 
will be continuing benefits to "playing the game" better. 



Storage Elierarchy Systems 30 

2.1.4 Understanding of Program and System Behavior 

As noted earlier, the detailed operational behavior of 
computer systems is often extremely complex. Thus, 
aecisions on hardware, software, and system design must 
often be made in spita of insufficient knowledge. A better 
understanding of progcam and system behavior is essential to 
the intelligent and efficient development of future systems. 

It is hoped that the research tc be conducted as part 
at this thesis will shed considerable lignt en these 
ma tters. 

2, 2 Storage MiS£*rchy A££roaches 

"Storage hierarchy system" and similar terms have been 
used in many contexts. Consistent with the objectives 
outliaed in the pravious section, certain particular 
contexts are assumed in this research. 

2.2. 1 Spectrum of Approaches 

The problems of storage hierarchy management have been 
attacked by a host of approaches. We can loosely 
characterize these efforts into three categories: 
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2.2.1.1 Manual Hierarchy Management 

3iven a specific ensemble of storage device 
tachnalogies, after cansiderable thought the programmer can 
explicitly or implicitly specify how his information (i.e., 
programs and data) should be organized and distributed 
within the hierarchy and how and when his information should 
be ra-arranged. Having determined the distribution, he must 
also specify his access to specify information accordingly. 

»hen a programmer is directly operating upon his 
information at the lowest level (e.g., using machine 
language, direct I/O requests, etc.), he is explicitly 
controlling the storage hierarchy, this is explicit mamjaj. 
iti.§£*££kY. !<yi*3*sl£at. In most conventional systems, the 
programmer communicates with the system via programming 
languages and control cards. Although this can relieve much 
o£ the tedious or intricate details of storage management, 
the overall control of the storage hierarchy is still 
primarily the responsibility of the programmer. This is 
ilfiiiSif janual hierarchy, manage men t. 

Sanual storage management can be very economical since 
it usually requires no special hardware features nor special 
system software. Furthermore, it places the control of the 
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storage hierarchy in tha hands of the programmer who is 
presumably tne one most familiar with his needs. Manual 
storage management, ia its many manifestations, is the most 
common storage hierarchy approach in use today. 

Manual storage management has many disadvantages, 
though. The amount of detail that tne programmer must 
understand and use can add significant complexity to this 
task. This then introduces additional areas of error and 
decreased productivity. Furthermore, the assumption that 
the programmer is the best judge of optimal storage 
organization is often wrong. The complexities and dynamics 
common to modern systems are often beyond the understanding 
of most application programmers. 

Multiprogramming, an almost universal technigue in 
current systems, necessitates strategies for global 
optimization whica usually differ substantially from the 
individual local optimizations of each program. For these 
reasons there has been continual search for "a better way". 

2.2.1.2 Semi- Automatic Hierarchy Management 

ttany techniques have been developed to minimize the 
amount of effort required of the programmer and to provide 
feedback to him. The programmer still has the ultimate 
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control in such a s em i- aut omat ic hierarchy management 
system . 

-ertain of these techniques are based upon the concept 
of the programmer providing "hints" to the system. These 
Hints form the basis for a partially automated, partially 
manual storage managament systea. Although not especially 
widespread, this approach has been used in several systems 
(e.g., Jensen et al [53], O'Neill et al [70], etc.). 

If there is a single application that is quite large 
and complex, techniques have been developed to analyze the 
actual performance and provide feedback, to the programmer. 
This approach is primarily used in specialized, dedicated, 
predictable, high-pert ormance systems, such as an airline 
reservations system. Numerous attempts have been reported, 
such as Arora et al [5 J, Raraamoorthy et al [72], etc. 

The various semi-automatic hierarchy management 
approaches help to raduce the programmer's effort and to 
attain a better local optimization. Although useful for 
cartain applications, these strategies do not remove the 
disadvantages already noted with manual hierarchy management 
systems. 
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2.2.1.3 Automatic Hierarchy Management 

Certain aspects of loyical information organization are 
inharant in a programmer's basic algorithm. In an automatic 
li.eracchy_ management system, all aspects of the physical 
information organisation and distribution that are 
irrelevant to the underlying logical structure should be 
raraovad from the programmer's responsibility. The 
programmer may wish to, maybe even be encouraged to, use 
algorithms tnat are Known to perform well in conjunction 
with the automated hierarchy management. But, the central 
responsibility of the storage hierarchy management is 
ramovad from the programmer. 

Since this approach directly focuses on the storage 
hierarchy objectives presented earlier, it will be the 
primary approach to be pursued in this thesis. 

2.2,2 Spectrum of Analysis Efforts 

Each of the storage hierarchy approaches mentioned 
above, primarily semi-automatic and automatic, have seen 
subjected to various forms of analysis. In this section we 
briefly outline the principal deficiencies of these efforts. 
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2.2.2.1 (Jeneraiizad Models 

One popular form of analysis is to assume a generalized 
aodal for hardware, software, and system behavior. If one is 
carefal in choosing the characteristics of the model (e.g., 
Poisson arrival and sarvice times, etc.), it is possible to 
develop precise analytical solutions. Unfortunately, it is 
usually difficult to validate these models except for rather 
simple solutions. Furthermore, since there are few truly 
automatic storage hierarchy systems in general use, it is 
extremely difficult to even determine realistic parameters 
for these generalized models even if the models were valid. 

Generalized modals have been reported in several 
papers, such as Aho et al [2] and Denning £25] in the 
Bibliography. 

2.2.2.2 Constrained Models 

another variation on the generalized model scheme is to 
analyze a particular program and then model its relationship 
to the rest of tha system. There are at least two 
shortcomings in this approach. First, as in tne generalized 
model case, it is iifficult to realistically model the 
relationship between a program and the rest of the system. 
Saconl, the analysis and measurement of the particular 
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program is normally converted into seme form of probability 
matrix or probabalistxc reference pattern. In either case, 
significant effort is required to accurately measure the 
program's behavior. Furthermore, the probabalist ic 
characteristics are usually aggregated to reflect the 
overall behavior of the program and, as a result, the 
dynamic nature of the program and its impact on the storage 
hierarchy are often lost. 

Example analyses of constrained models can be found in 
references: Arora and Gallo [5], Hatfield and Gerald [47], 
Lawis and ¥ue [60], and Eamamoorthy and Chandy £72 J. 

2.2.2.3 Limited Environment 

k common deficiency of most previous research is that 
only a limited environment was considered, in particular 
automatic hierarchy management over enly two levels using a 
singla page size. Of course, most current-day computers have 
only employed automatic hierarchy management in either Cache 
Systeas (cache store - main store) or Paging Systems (main 
stora - large store). Unfortunately, there is definite 
reasons to believe that many of the conclusions and 
tscanigues demonstrated for a two-level hierarchy do not 
necessarily generalize to handle the spectrum of program 
detail and device characteristics encountered in a truly 
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multiple level storage hierarchy. Furthermore, many of the 
papers that attempted to investigate general storage 
aieracchies assumed techniques and approaches that are 
primarily based upon two-level hierarchy assuaiptions. 

This limited environment has been studied by numerous 
people, such as Aho et al £2], Belady et al £10,11,12], 
Coffman and Varian [19,86], Conti et al £21,22], Denning 
[25], Fotheringham [33], Guertin £45], Kilburn et aj, |.57], 
riattson et al [63], Seligman £78], Smith £81], and Wilkes 
[38]. 

2.2.2.4 General Hierarchy Environment 



The studies of limited two-level storage hierarchies 
have been quite successful in many actual systems. A 
reasonable strategy would be to extend these techniques to a 
more general storage hierarchy environment. There have been 
a faw attempts along these lines, but as mentioned in the 
previous section, most were hampered by: 

(1) attempting to directly apply two- level hierarchy 
techniques without carefully considering their 
applicability, 

(2) attempting to generalize techniques which were not 
even tully understood in a two-level environment. 
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The major thrust of this thesis is to provide insight 
into and shad additional light on these problems. 
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CHAPTER 3. 
FORMALIZATION OF STORAGE HIERARCHIES AND RELATED RESEARCH 

3 • Introduction 

In this chapter a formalization of the key 
characteristics of storage Hierarchies is presented and 
performance measures are derived. The reported performance 
of actual systems is reviewed. 

3. 1 Major Parameters of a General Storage System 

Table 2 and Figure 1 illustrate the major parameters of 
a storage hierarchy system. These parameters can be grouped 
into four categories: (1) basic technology, (2) 
canf iguration, (3) algorithm, and (4) program behavior. 

3.1.1 Basic Technology 

The basic technology parameters, cost/byte, C, and 
££§EsilQ 3.SS2SS time, T, are primarily dependent upon the 
physical properties of the storage device technology. At any 
given time the state-of-the-art offers only a limited number 
of (C,T) alternatives that the system designer can select. 



Storije Hierarchy Systems 



40 



Sasic Technology 

• C cost/byte 

• T average ac:ass time 



(ft2) 



L25.ti2uration 

• L number of levels 

• I interconnection of levels 

• S size (capacity) 

• B transfer rate (bandwidth) 

• N number of aytes in page (page size) 



^£2i£il Behavior 

• A address trace 

Algorithm 

• F retch 

• P placement 

« K replacement 



Table 2. 
Major Parameters of a Storage Hierarchy System 
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Figure 1. 
Structure ot a Storage Hierarchy System 
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J. 1.2 Configuration 

The system designer does have flexibility in organizing 
ttiese storage devices. By serial and/or parallel structuring 
of the components of a given level of storage device 
tacanology, it is possible to specify, over a wide range or 
values, the size (storage capacity), S, and the maximum 
transfer rate (data bandwidth), B, of the system. For 
example, if a particular technology provides a tasic device 
with 3 =s and 3 = b, connecting n cf these devices in parallel 
produces a storage leval with S=ns and B=nb. (To some extent 
the mechanism and cost of the organizational structure does 
influence the overall cost/byte and average access time of a 
laval, this effect is usually minimal for small values of 
n) . 

3n a more global basis, the designer must determine the 
number of levels, L, in the storaye system, the 
i.iter connections ot the levels, I, and the size, N, of a 
page (the unit of information moved between levels). 

3.1.3 Program Behavior 

The £E2£§§S2£r under program control, produces a 
sequential series of raterences to the storage system. Ihese 
processor references are in the form of logical address 
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references which serva to uniquely identity each individual 
unit of stored intormatation (e.g., an d-bit byte) 
independent of its location (i.e., M l , M 2 , A 3 , ...). The 
time sequence of logical address references. A, is called an 
*ii£S.iS £.£§£§ or *44ES§s reference pattern. In general, each 
unigua program and its input data will result in a different 
processor address trace. For purposes of analyzing the 
effectiveness of the storage hierarchy, the address trace is 
the primary characterization of a program that is needed 
(2.g., we don't care what the program's purpose is or what 
language it is written in, etc., we cnly care about its 
aidcess trace) . Thus, the address trace describes the 
2£22Eail§. behavior as observed by the storage hierarchy. 

J. 1. 4 Algorithm 

There are three basic decision algorithms that must be 
employed by an automatic storage management system, fetch, 
F, decides when and which intonation should be moved up a 
lavel (e.g., from H 2 to M l ). Ilacement, P, decides wnere 
iaforaation should be placed in a level. Removal or 
£S£iiS§fl§SI, a, decides when and which information shcuid be 
transferred down a level (e.g., from M l to M 2 ) . 
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J. 2 The Storage Hierarchy^ Model 

k completely genaral storage hierarchy algorithm, H, 
mast consider all the parameters described above: 
a = f (<Technology>, <Conf iguraticn> ,<Program>, <Algorithm>) 
H = f(<C,T>, <L f I, S, B, N>, <A>, <F , P, H>) 
Jllearly, attempting to optimize a system with sc many 
piramaters is difficult. Fortunately, it is possible to 
eliminate from comara or at least simplify certain 
parameters as explained below. 

3. 2, 1 Configuration 

Consistent with the title of this thesis, we shall 
consider only hierarchical interconnections of levels as 
illustrated in Figure 1, where T»<T a <T 3 < etc. and N*<N 2 <N 3 < 
ate. The rationale for this decision is elaborated in the 

thesis. 

There are threa basic strategies for information 
movaiaant sizes: (1) select a single page size value, N, 
rfhicii is always used throughout the hierarchy - this 
approach is used on aost contemporary automatic multilevel 
storage systems (e.g., Multics), (2) allow an arbitrary 
range of values for N to be used - this approach is 
primarily used on manually managed storage systems, and (3) 
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salect L values of N, a specific unit of transfer is used 
between any two levels of the hierarchy - this approach will 
ba pursued and justified in this thesis. 

J. 2. 2 Program Behavior 

Each logical address can be represented as a bits as 
shown in Figure 2(a). If the page sizes, N, are chosen to 
bs powers of 2, the set of 2**a possible addresses can be 
partitioned into 2**p pages of N=2**n consecutive logical 
addresses each as shown in Figure 2(b). [Note: the notation 
"2**a w means 2 raised to the power a]. Since the information 
movement between storage levels is accomplished by 
transferrxng pages, wa can analyze this inter level movement 
by merely considering the time seguence of logxcal pages 
references, Ap, called a i>age trace. 

Since we allow the page size to be different between 
each level and reguests are only passed down to a given 
level if they cannot be satisfied by any higher level, each 
level will usually experience a different page trace tnough 
all are algor ithmically derivable from the same address 
trace. In fact, if all address references were broadcast to 
all storage levels, the page traces can be determined by a 
simple mapping from logical addresses into logical pages: 
page address = integer ( logical address/N ) 
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where N is the page size for that lavel. 
J. 2. 3 Algorithm 

The placement decision, P, is usually unconstrained or 
minimally constrained and, as a result, has relatively 
littla impact upon performance. 

1 ^®aaS.4 fetch policy will be used. Assume that at time 
t a request for logical address a (or, equivalently, 
P*=iateger (a/N*) ) arrives at level M». At that instant the 
information may currently reside in M 1 , otherwise it must be 
found in a lower level. Under demand fetch, if p» xs in M*, 
the reference proceeds, the information is passed back to 
the processor, and no other page movement occurs in the 
nierarchy. It pi is not in M», a request for 
P 2 =iateger (a/N2) is sant from M* to M*. If p* is in ««, the 
page is transferred to M 1 and processing continues as 
iascribed above, otherwise a reyuest for p 3 =iateger (a/M 3 ) is 
ssnt from M« to M 3 , ate. Note that under the demand fetch 
policy, information is only moved up in the hierarchy when 
and if it is explicitly demanded (i.e., requested) by the 
processor. 

Although demand fetch is only one possible retch 
algorithm, it can ba shown £ b 3 ] that for hierarchically 
structured storage systems: 
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' , .,. given any tcxza and replacement algorithm (not 
nacessarily using demand paging) another replacement 
algorithm exist that uses demand paging and causes 
the same or fewar total number of pages to be 
transferred ..." 



la other words, as you might intuitively suspect, moving 
pages only when necessary results in the mxnimal number or 
page aovetnents. Of course, if page movement is required and 
the higher level that is to receive the page is already 
full, the removal algorithm must be employed to provide 
space for the new page. 

J. 2. 4 Revised Storage Hierarchy Model 

Based upon the discussion above, we can slightly 
simplify the parameters remaining fci consideration in the 
storage hierarchy algorithm, H, so that it need consider 
only: 
H = f (<xeciinology>, <Jonf iguration>, <Program>, <Algorithm>) 
H = f{<C,T>, <L,S,B,N>, <A>, <£>) 
la this thesis all of these parameters will he considered 
and investigated. Special emphasis will be placed on 
analyzing and understanding the relationship between the 
pages sizes, N, and the removal algorithm, H, required for 
efficient operation of the storage hierarchy. 
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■3» 3 Pacf ocmance Measures 

There are various performance measures that we could 
consider. For an overall point of view, system measures, 
such as job throughput, job turn-arcund time, and processor 
utilization, are quita significant. Unfortunately, it is 
extremely difficult to directly relate these measures to the 
performance of the storage system, even an approximation 
would require consideration of many more parameters. Ihus, 
wa will only consider measures that relate to the effective 
performance of the storage hierarchy. 

3, 3. 1 Performance Measurement Notation 

Que to the strict hierarchical structure of our storage 
system and the demand fetch policy, we can analyze the 
performance of the system by separately considering tne 
levels of the hierarchy starting with M 1 . Since a given 
laval only receives a page fetch request it the information 
has not been found it a higher level, each level usually 
saes a different page trace, Ap* , Ap*, Ap 3 , etc. 

There are several important properties of page traces. 
If P is a particular page trace (e.g., Ap 1 ) of a program, we 
def ina: 

• |P| length of the page trace sequence 
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• Q set of distinct pages referenced in P 

• \Q\ numner of pages in Q 
Far example, in the page trace 

P = a, b, a , c, b , a 
we observe that 
I P I = 6 
g = {a, b, ~) 

IQI = 3 
(Lower case letters will be used to represent logical page 
aldrasses instead of page numbers). 

For a specific storage hierarchy, we define JM| to be 
the size of fl in units of pages receivable from the next 
lawec level. For example, iMH = SVN 2 , |H 2 |=S 2 /N 3 , etc. 

For a specific page trace, P, storage level, M, and 
removal algorithm, R, we define the result page t£ace or 
£§f£fe 2*SLS t£§£§# p, » as the time seguenced page references 
of P that were not found in M. We shall call page 
referances that are found in M successes. The succe ss 
function, Sf, is the number of references satisfied by M and 
can be computed as JP|-|P*1. By analogy to the success 
function, tne number of references not satisfied by M, IP* I, 
is called the failure function, Ff . In general, we wish to 
maximize the success function or, eguivalently , minimize the 
failure function. It is convenient to normalize the failure 
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function oy defining the failure frequency func ti on, f, 

f = IP* l/JPI 

The success frequency function, s, can be easily confuted as 
1-f ; it is often called the hit rate on a two-level storage 
system. Me also define the system failure fre quen cy 
£3SS£i2a# f °» of a level to be: 

t° = |P« J/Ui 
where A is tha address trace generated by the processor and 
IM is the length of tae address trace (it is also true that 
I A | always equals |P>|, thus they may be used 
interchangeably) . The system su cce ss f£eauenc_y function is 
correspondingly defined as s°=1-f°. 

If we apply tha definitions above to the processor 
generated page trace, P», received by »i, we note that the 
result page trace, P* , is essentially the page trace, P 2 , 
received by M 2 . rhera is a minor relabeling required to 
aijust for the difference in page size used by M 2 , 
p 2 =P* (N*/N 2 ) . By repeating this process recursively, we can 
develop the effective page traces, failure and success 
functions, and failure and success frequency functions for 
each level of the hierarchy. Since we assume that all 
referenced information exists in the storage hierarchy, the 
sum of the system success frequency functions must be 1. 

3ne general measure of a storage hierarchy's 
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performance is its effective access time, T 1 , and effective 
cast, C* , which are defined as fellows: 

T« = T>S°l + l'2s02+T 3 S°3+,,, 

C = (C1SU;2S2+CJSH, ..)/(S 1 +S2+S3 + ,.,) 
f and C can be viewed as characterizing the entire storage 
hierarchy according to a corresponding one-level system, 
From a cost/performance point of view, one should be 
iadif ferent between a single-level single-technology storage 
device with average access time, T', and average cost/byte, 
J*, and a storage hierarchy system with performance 
parameters (T^C 1 ). In particular, if the system designer 
needs a storage performance (T,C) and no such basic 
technology exists, he must attempt to develop a storage 
hierarchy such that (r',C) = (T,C). 

j. 3. 2 Page Trace Simulation 

Dne way to determine the success freguency function and 
the result page trace for a specific page trace is to 
simulate the storage management algorithms and note the 
contents of N at each step of the page trace. Clearly, 
these results depend upon numerous parameters (e.g., 
specific trace, removal algorithm, size of W, etc.). Figure 
3 illustrates this step by step simulation assuming demand 
paging, FIFJ (first-ia first-out) removal, and |M| = 2 
pages. For simplicity, the page trace, P, has been 
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Figure 3. 
Example of Page Trace Simulation 
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normalized to be expressed in units of receivable pages. In 
particular, if M is &*■ , then |M|=S»/ N2 and p=integer (a/N *) 
whet a a is a loyical address reference and p is 
corresponding page reference. Ihe pages in M are shown as 
ordered to indicate the FiFO ordering, the top page is the 
"last" ("latest") paje fetched into M, whereas the bottom 
page is the "first" ("oldest") page in M and is the page 
selected for replacement when necessary. The asterisk (*) 
ladicates that a retch was required frcm a lower level of 
ttie hierarchy, the page reference is thus noted as part of 
the result page trace, P'. 

It is normally assumed that all levels, except level L, 
ice empty initially, thus there is a transient stage during 
w&ich pages are loaded into M without any replacements 
neeiel. Since there are so few pages in 'A during this 
start-up stage, there are many fetches required. we will 
find it useful to separate out this transient phenomenon, 
mis transient consists of the page trace up to the first 
| .1 | unique page references, in the example of Figure 3 this 
is the first 2 page references (i.e., a, b) . Consider the 
case if | Q J < | Ji | , there would be no further fetches into this 
level after the initial transient that loads the |Q| pages 
into H. In this case, |P']-|QI exactly, independent of |Pj, 
and s tends toward 1 as |P| increases. 
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In the particular example illustrated in Figure J, we 
Q3te that there were 3 'hits' and 7 'misses' out or 10 page 
references, so that s=3Ga. Thus, P» cnly consists of 7 page 
references to the iowar levels. 

3*1 H-Iated Research 

ks noted above, we wish to develop a storage hierarchy 
with attractive cost/performance, (C',T') f characteristics. 
It is clear that wa :aa arbitrarily decrease the cost/byte 
ay teasing the size of aach level, 3, increasingly larger as 
*e go from the high-performance high-cost to the 
Low-performance low-ciost levels (i.e., C 1 >C 2 >C 3 >. . . and 
S 1 <S 2 <S3<, , . ) . in fact, this approach is the basic 
motivation for storage hierarchies. 

Unfortunately, if the processor generated address 
rafarances that wera uniformly distributed in time and 
aidcess, aach byte would be egually likely to oe referenced 
at any instant. This probability would be: 

Pr[ reference aj = 1 / (S* +S 2 + S 3 + . . . ) 
Thus, the expected system success function, s°, for each 
lavel is proportional to the size of the level. For example, 

s°» = SV(S l +S 2 + S 3 + . ..) . 
Jut, since we have assumed that S 1 <S 2 <S- J <. . . , we find that 
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j 9 l <s oz <s° 3 <. . . Thus, the system success function for the 
Lth lavel dominates (i.e., is approximately 1) since we nave 
assumed that it is the largest level. Referring back to our 
dafinition of effective access time, we find that T' would 
D3 approximately equal to the lowest performance level 
(level L) since all the other terras would be negligible. If 
tais analysis were trie, our storage hierarchy would result 
in a performance just slightly better than our lowest 
partoraance level at a moderate increase in price - not an 
especially exciting result. Fortunately, actual storage 
hierarchies do not behave tins way. He will briefly review 
some related research on this subject. 

3. 4. 1 Locality 

It has been empirically observed that actual programs 
cluster their references so that, duriny any interval of 
time, only a subset of the information available is actually 
used. A detailed discussion of this phenomenon will be 
presented in the thesis. 

It is important to note tnat due to our basic rankings 
o£ page sizes and access times in the storage hierarchy, 
each level "sees" a different view of the program. The high 
LivaLs of the hierarchy must fellow the microscopic 
instruction by instruction reference pattern whereas the 
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middle levels follow a more gross subroutine by subroutine 
pattern. The very low levels are primarily concerned about 
the processor's references as it moves from subsystem to 
subsystem. de do not have any a priori guarantee that 
locality of reference holds equally true for all of these 
views, but we do have some reported evidence to encourage 
as. Most of these stadias have been basea upon twc-level 
storage systems or restricted forms of three-level 
Hierarchies. 

3,4.2 Paging Systems 

The earliest automatic storage systems were based upon 
two-level core-drum hierarchies (devices 2 and 4 of Table 
1) . This technique was introduced in the Atlas system 
[38,57] during the early 1960*s. It has since been used on 
many contemporary systems. 



The performance of paging systeas has been studied by 
various researchers, such as Belady [12], Coffmau and Varian 
[19,86], Hatfield [48], and Sayre [77 J. In Coffman's 
results, for example, it was noted that even though 
S*/(S* *-S 2 ) =0.25, si otten exceeded 95%. Hatfield studied 
the performance of system programs that had been carefully 
designed and found that for S 1 /{Si+S 2 } ratios as low as 
J. 25, it was possible for s 1 to clten exceed 99.99/.. 
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3. 4. 3 Cache Systems 

3ache systems are based upon two-level cache-main 
hierarchies (devices 1 and 2 of Table 1). Although they have 
bsen proposed as early as 1965 {see Wilkes [88]), tbe major 
commercial use of cache systems did not occur until the 
introduction of the IBM System/360 Model 85 £21,61], More 
recently, tins tecnaigue has been used in several 
contemporary systems, such as the IBM System/370 Model 155 
and Model 165 [52 ]. 

Iu these cache systems, IBM found that it was possible 
to drastically reduce SV(S l + S 2 ) to as low as 1% and still 
naep the hit ratio, 3 l , above 90%. Similar findings were 
Also reported by dell and Casasent [13], Mattson [64], Meade 
[35], and Seligman [ 78 J« 

3.4.4 Three-level Systems 



There have been a few three-level systems reported in 
the literature, unfortunately they have all been somewhat ad 
aoc in design and t&e results are far from conclusive. 
There have been at l^ast three types of such hierarchies 
st udiad. 
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3.4.4.1 Main-Bulit-Mass Store Hierarchy 

There have been several systems devised based upon 
davicas 2, 3, and 5 of Table 1. The Bulk Store actually 
used, called Large Core Store (LCS) , had a much lower access 
time (around 8 us) ani a much higher price (about 252/byte) . 
la order to compensate for peculiarties in the hardware 
structure and out of considerable concern for the extreme 
cost of LCS, these systems tended to become much more 
manually managed hierarchies than automatically managed. 
Although they were fojind to be effective, it is difficult to 
generalize the results. The most ambitious attempt reported 
was undertaken by Carnagie-Mellon University £36 J. flesults 
have also been reported by Durae [31], Williams L d9 J# and 
others . 

3.4.4.2 Main-Larga-Mass Store Hierarchy 

There does not appear to be any automatically managed 
systems of this type published in the general literature, 
rtie Sultics system at MIT Project MAC has recently 
introduced a "page-multilevel" strategy based upon devices 
2, 4, and 5 of Taole 1. There has only been limited finding 
reported to date but it has been stated in the Match 1972 
issue of the MIT Information Processing Services Bulletin 
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(p. 11) that it 

"... does pay off since it meets fluctuating demands 
oa the system, reduces the workload for the disks to 
aa efficient level, is inexpensive, and keeps pages 
oa the drum for an acceptable length of time." 

As ia indication of its effect, the new strategy is reputed 
to have increased the success frequency function, s 2 , of the 
Ic am from 20* to more than 90S (i.e., "reduced from one page 
r3ai from the disk foe every four reads from the drum, to 
one page read from the disK. for every ten to twenty pages 
from the drum") . 

3.4.4.3 Main-Lar ge-Giant Stcre Hierarchy 

The work of Considine and Weis [20] is difficult to 
cata^arize. It is iaased upon a three-level hierarchy where 
ttie first level corresponds to device 2 (main store) of 
Table 1, the second level corresponds to a combination of 
devices 4 (drums) and 5 (disks) , and the third level 
-insists of removable disks which can best be approximated 
o/ device 6 of labia 1. It is impossible to compute any 
success frequency fun-tions from their data, but it appears 
tli at for 3 2 /(S z *5-») =0. 5, s« is very high. Ihey note 
(p. 44J), in particular, "most of the data moved to the 
archival storage (i.e., a 3 ) have stayed there." 
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3. 4, 5 Need for Additional Research 

Although the results of research described above is 
aacoucagiag, the design and performance of general 
aultiple-level storage hierarchies are still inconclusive. 
Ttvis thesis is intendel to provide specific results in this 
area. 
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CHAPTER 4. 
A STORAGE HIESABCHX SYSTEM 

4.0 Introduction 

In this chapter a design for a general multiple level 
storaje hierarchy system, in particular with tnree or more 
levels, is presented. This design is based upon an orderly 
and uniform treatment of the logical structure of the 
storage levels and their interconnections. In addition to 
providing a solution to convenient storage management for 
tne user, this design is intended to produce good 
part armance for the storage hierarchy as measured by its 
affective access tima, T', and effective cost, C. Tne 
principle and novel techniques to be used are described 
separately in the sections below. 

4. 1 S2Ii.tin.y2.usf hierar^tiy, 

As noted earlier, automatic storage hierarchy systems 
ire still in the minority. Amongst those systems that do 
provide automatic storage hierarchy management, the majority 
limit their scope to two levels with a few rare three level 
systaus. As a result of these limitations, the user is 
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still forced to rely on manual or semi-automatic storage 
management techniques to deal with the storage levels tiiat 
are not automatically managed. Thus, an automatic storage 
management system should consist of a continuous hierarchy. 
that encompasses the full range of storage levels. 

4. 1. 1 Cost/Performance of Adjacent Levels 

4 major obstacle to generalizing storage management 
algorithms, iu particular in two-level paying systems, is 
the tremendous contrast, often over 3 orders of magnitude, 
in cost/performance between M* and M*. As illustrated in 
Tible 1 (page 28), a representative Main Store, M', has an 
access time of 1.44 us compared to a Large Store, H 2 , with 
aa access time of 5 as, In such a tuo-level system, the 
effective access time, r«, is 

r« = Us 01 + l*s°2 

T« = 1.44s°* + 5Q00s°2 
and since s°'+s°2=1 # *a can substitute s°i=1-s°2 to get 

T« = 1.44 - 1.44s°2 + 5C00s°2 

T» = 1.44 ♦ 4998. 56s°* 
In orier to attain an effective access time, T«, that is 
comparable to the Wain Store access time, T 1 , we must keep 
trie system success rraguency function, s oz , very close to 
or, correspondingly, iteep s 01 very close to 1. Even with 
s 3 * at 99. 8S, an improvement to 99. 9* would cut the 
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effective access time, T 1 , in half. With such pressure to 
attain very high s° l values, the systems designer is often 
tjvai to see* out vary specialized techniques in contrast 
to our goals of orderly and uniform algorithms. 

4.1.2 Moderate Liost/Pari ormance Ratios 

In order to aak.2 the storage hierarchy design robust 
and flexible, the cost/performance characteristics should 
differ by less than two orders of magnitude between adjacent 
levels. Thus, success frequency functions in the range 90% 
to 99* are adequate to insure reasonable performance. If 
ttie differences are much greater, it will be difficult to 
tinl sufficiently efficient general algorithms. Since minor 
jnangas in production techniques and technology evolution 
can result in a variation of a factor of two or three in the 
sost/perf ormance for a given technology, it is not desirable 
to decrease much oelow one order of magnitude difference 
between adjacent storage levels. 

•4.2 Sh adow 3 tor age and Page Splitting 

The time. Tin, required tc move a page between two 

lavals of the hierarchy usually consists of summing two 

components: (1) the average access time, T, and (2) the 

tcansrer time, flxN. 
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If all page sizes were set to provide exactly the 
amount of information, N* , requested by the processor, the 
page movement time would be 

Tni = T + BxiO 
whera T and a woull depend upon the particular storage 
levels. By examining the representative devices shown in 
Table 1 (page 28), we see that access time varies much more 
than transfer rate (i.e., access time spans 6 orders of 
magnitude whereas transfer rate varies by only 3 orders of 
magnitude) . 

4.2.1 Marginal Increase in Page Transfer Time and Reference 
Probability 

Let us assume that N* is quite small, such as d bytes. 
Wa can ask the question: What is the marginal increase in Tm 
if we transfer the adjacent N» bytes in addition to tae N »■ 
bytes requested by tha processor? Table J on the next page 
answers this question. Notice that the marginal increase in 
la decreases from a high of 5.5% (level 2 to level 1) to a 
low of .002* (level 6 to level 5) . This fact is only 
interesting if we alsa consider the concept of locality (see 
Jhaptars 5 and o tar additional discussion) and the 
question: What is tha probability, Pr, that the processor 
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<ft3) 



Level fa Tm Marginal Increase 

Transfer (1 unit) (2 units) in Tm 



2 to 1 (*) 1 .44 us 1.52 us 5.5% 

i to 2 131 us 132 us .8% 

4 to 3 5006 us 5011 us . 1% 

5 to 4 38010 us 38020 us .03% 

6 to 5 600013 us 600027 us .QC2* 



Table 3. 
Marginal Increase in Page Transfer Times 



* Iha figures for access time and transfer rate for the 
Main Store listed in Table 1 are approximations that are 
only meaningful for very large page sizes. For the page 
sizes under consideration in this chapter, the figures used 
ia tha table above are more appropriate. 
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will reference the adjacent H* bytes with a short interval 
of time, such as Tm seconds? Due tc locality of program 
reference, we would expect Pr to be much larger than merely 
the reciprocal of the logical address space size. 
Furthermore, Pr shoull increase as Tin increases. Thus, for 
a given level, if Pr is larger than the marginal increase in 
Ts, it is beneficial to transfer the additional N» bytes and 
taeraby avoid the necessity of expending Tm seconds to 
transfer these N 1 bytas later separately. 

These same arguments can be applied to tiie guesticn of 
transferring the adjacent nxN 1 bytes, etc. Since the 
iaarginal increase in Tm decreases monotonically as a 
function of storage laval, the number of N* byte pacxets to 
Da transferred as a single page should increase 
monotonically. This confirms cur earlier decision that 
N*<N2<N3< etc. 

4.2.2 Choice of Page Size 

In order to simpliry the implementation of the system 
and ta be consistent with the mapping from logical address 
to page address illustrated in Figure 2 (page 46) , we will 
raguice that all page sizes oe a power of two. Thus, each 
page size (e.g., N^) is some power of two larger tnan the 
page size of the next higher level (e.g., N 3 =N 2 **i). 
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Jlearly, the specific values of Pr and thus the choice tor 
aach page size depeals upon the characteristics of the 
programs to be run and the effectiveness of the overall 
.storage systeia. Preliminary measurements indicate that a 
ratio of 4:1 between levels is reasonable, Meade [65] has 
reported similar findings. Other important factors 
affecting page size are discussed in Chapters 5 and 6. 

H.2.3 Page Splitting 

Now let us consiiar the actual movement of information 
in the storage hierarchy. At time t, the processor 
generates a reference for logical address a. Assume that 
tie corresponding information is not currently stored in M 1 
oc H 2 but is found in M 3 . For simplicity, assume that page 
sizes are doubled as we go down the hierarchy (e.g., N 2 =2N 1 , 
N 3 =2N 2 =4N 1 , etc.; ssa Figure 4). The page of size N 3 
containing a is copiad from M 3 to W 2 . M 2 now contains the 
naedei information, 53 we repeat the process. The page of 
size N 2 containing a is copied ficm M 2 to tt l . Now, finally, 
the page of size N 1 containing a is copied from H l and 
torwarded to the processor. In this process the page of 
information is split (i.e., £a<je splitting) repeatedly as it 
moves up the hierarchy. 
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Figure 4. 
Page Splitting and Shadow Storage 
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1.2.4 Shadow Storage 

&s a result of this splitting, the page of size N 1 that 
is received by the processor has left a "shadow" consisting 
of itself and its aljacent pages behind in all the lower 
levels (i.e., shadow storage). Presumably, if the program 
exhibits locality ot reference, many of these shadow pages 
will be referenced shortly afterward and be moved further up 
ia the hierarchy also. 

4.2.5 Copying of Payes 

In the strategy presented, pages are actually copied as 
they move up the hierarchy; a page at level n has one copy 
of itself in each of the lower levels. Since processor 
"fetch" requests substantially outnumber "store" requests 
(a.g., by more tnan 5:1 in some measured programs), the 
contents of pages are seldom changed. Thus, if a page has 
not been changed and is selected to be removed from one 
laval to a lower level, it need not be actually transferred 
since a valid copy already exists in the lower level. The 
contents of any level of the hierarchy is always a subset of 
the information contained in tne next lower level. Thus, 
tie total information capacity of the system is equal to the 
size of the level L store rather than the sum cf the 
capacities of all the levels. Since the capacity of level L 
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is assumed to be much larger than the capacity of L- 1 , etc., 
the difference in total system capacity due to shadow 
storaje is minimal. 

** • 3 Direct Tr ans f er 

In the description above it is implied that information 
actually moves between adjacent levels. This approach, 
called direct transfer, is indeed intended. By comparison, 
taough, many proposed and experimental multiple level 
storage systems are based upon an indirect transfer (e.g., 
the Multics "page multilevel" system mentioned in cnapter 
2) . In these systems, all information is routed through 
level 1. For example, to move a page from level n to level 
n-1, the page is moved from level n tc level 1 and then from 
level 1 to level n-1. Clearly, this indirect approach is 
undesirable since it requires extra page movement and 
consumes a portion of the limited W* capacity in the 
process. 



There have been two major obstacles to direct transfer 
in previous systems: (1) interconnection structure and (2) 
synchronization. 
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4. 3. 1 Interconnection Structure 

for many reasons, some technical and some historical, 
uiDst contemporary systems are physically structured in a 
radial manner. That is, there is a central element to the 
3/sten, eitner the processor itself cr the primary store, 
and all other storage devices and/or processors are directly 
connected to this central element. Except for some possible 
cjntcol signals, thare are no direct data transfer 
connections between the non-central elements. Thxs 
structure is, of course, quite consistent with a 
non-aierarcnical storage management system. A logical 
storage hierarchy systam should be based upon a physically 
hierarchical interconnection structure. 

4. J. 2 Synchronization 

ks indicated in Table 1, storage devices often have 
different timing and transfer rate characteristics. In order 
to accomplish a direct data transfer between levels, 
s/nchr onization is necessary. It may be obvious that a 
storage device can not transfer data faster than its rated 
performance, but tor many storage devices, especially 
electromechanical devices, it is not possible to transfer 
data slower than its rated speed. 
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Based on current technology, this problem can be 
salvei. Many of the storage devices are now 
aon-electromechanical (i.e., strictly electrical), sucn as 
trie Cache, Main, and Bulk Stores of Table 1. it is guite 
feasible to provide direct transfer between any of tnese 
levices and any other storage device; this is one reason tor 
the radial interconnections described above where the Main 
Store acted as the common means of providing 
synchronization. Using a similar approach, we can allow 
direct transfer between electromechanical devices if this 
transfer is routed through a small and reasonaoly 
inexpensive electrical storage buffer. Femling i.33] 
discusses such a devise, which he calls a rubber-band memory. 
presumably because it "stretches" to matcn the 
characteristics of the source and destination devices. 

4.4 Bead Through 

In the descriptian above, it is implied that a transfer 
up the hierarchy from level 2 to the processor (level 0) 
consists of two sequeatial steps; (1) transfer page of size 
N* fcom level 2 to level 1, and then (2) extract the 
appropriate page subset of size N l and transfer it from 
lsvel 1 to the processor (level 0). In general, a transfer 
fcom level n to the processor would consist of a series of n 
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steps. Thus the system page transfer time would equal the 
sum of n inter-level page transfer times (e.g., 
ri'nra23tra"+ ...)• Furthermore, for many 
electromechanical storage devices, the second access, 
required to forward the page subset, may experience the 
'•tnaicifluai" access delay rather than the "average" (i.e., 
after storing the information into the level, a complete 
mechanical revolution may he required to reposition to read 
trie same information and forward it to the next level). 

This inefficiency can te avoided by allowing 
information to be stored into all upper levels 
simultaneously. Figure 5 illustrates this mechanism. If 
information is to ba transferred from H 3 to the processor, 
I 3 turns on its output data gate, G 3 out, when it is ready to 
start and transfers N 3 bytes and their corresponding logical 
addresses up the data bus. fl* turns on its input data gate, 
-; 2 in, to receive these N 3 bytes; furthermore, when the 
appropriate N* bytes needed oy B> are detected by fiz, it 
turns on its output data gate, G z out, and these N* bytes are 
forwarded to M * while being stored in M*, etc. 

for example, assume a reference tc logical address a is 
generated by the processor and the corresponding information 
is current stored at level n (and all lower levels, of 
caurse) . At the instant that the N* bytes containing a are 
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placrei on the data bus by level n, these N 1 bytes will be 
store! into all levels from level n- 1 to level (the 
processor) simultaneously. Likewise, the N* bytes 
containing a are simultaneously stored into all levels from 
lavsl n- 1 to level 1. This strategy thus makes it appear 
that the N* byte page requested by the processor is read 
tii£2iilk directly to the processor without any delays. 

4.4.1 Page Transfer Time 

Jsing the read through strategy, the page transfer time 
to ttia processor is actually less than the page transfer 
time to the adjacent storage level. For example, if the 
raguasted information is stored in M 3 , the page transfer 
time to the processor, via read through, is 

£^30 = T 3 + HIQ3 

whereas, the page transfer time frcm M 3 to K 2 is 

rn« = T 3 * N 3 B*. 
Sin~e Ni<N 3 , then Tm 30 <Tm 32 . 

4.4.2 Availability anl Servicability 

The read through mechanism described above ofiers some 
inportant advantages to the availability and serviceability 
of tha storage system. Note that all storage levels are 
connected to the gate! data bus not directly to each other. 
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IE a storage level must be reaoved from the system for 
seriri^ing, it is merely necessary tc manually set both Gin 
aad 3out "on". In this case, the information is really 
"read through" this level as if it didn't exist. No other 
caanges are needed to any of the other storage levels or the 
storage management algorithms although we would expect the 
performance to decrease. 

'4. 5 Store Behind 

Under normal steaiy-state operation, all the levels of 
tfte storage hierarchy will be full (except possibly level 
L) . Thus, whenever a page is tc be moved into a level, it 
is necessary to remove a current page. If the page selected 
for removal has not been changed by means of a processor 
"store", the new page can be immediately stored into the 
laval since a copy of the removed page already exists in the 
next lower level oc tne hierarchy. If the processor 
generates a "store" request, all levels that contain a copy 
of the information being modified must fee updated. This can 
Qi accomplished in three basic ways: (1) store through, (2) 
store replacement, or (3) store behind. 
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4. 5. 1 Store Through 

Under a store through policy, all levels are 
simultaneously update! whenever the processor generates a 
"store" request. This is the obvious inverse of the read 
tarough policy. But, there is a crucial distinction. Under 
read through, only storage levels 1 through n are used, 
where n is the highest level containing the requested 
information. Store through must update the contents of 
levels n through L. faus, read through speed is liaited by 
its slowest level aifacted, level n; store through is always 
limited by the speed on level L, the slowest level of thea 
all. If 20% of all processor requests are "stores", the 
system success frequency function ct level L will be at 
least 20*. Due to its large average access time, level L 
will be the dominate portion of the system's effective 
access time, X*. 

Store through can be used efficiently only if the 
access time of level L is comparable to the access time of 
level 1, such as in i two-level cache system. In fact, it 
is used in some cache systems, such as the IBM System/370 
Models 155 and 165 [ 52 J. 
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4.5.2 Store Replacement 

Under a store £g£lacement policy, the processor only 
stores into S l . Ir a changed page is later selected tor 
removal, it is then moved to the next lower level, M 2 , 
immediately prior to being replaced. This process occurs at 
every level and, eveutually, level L will be updated but 
only after the page has been selected tor rearoval from all 
the higher levels. Due to the extra delays caused by 
updating changed pages nefore replacement, the effective 
access time for fetches is increased. Various versions of 
store replacement are used in ucst two-level paging systems 
since it offers substantially better performance than store 
through for slow second level storage devices (e.g., drums 
and disks) . 

4. 5. 3 Store Behind 

Store Behind is a compromise strategy that bridges the 
.jap between store through and store replacement and offers 
substantially better performance. In both strategies afcove, 
the storage system *as required tc perform the update 
operation at some specific time (e.g., at the instant of the 
"store" request for store through or at the instant of 
removal for store replacement) . Once the information to be 
store! has been accepted by the storage management system. 



Storage Hierarchy Systems 80 

the processor doesn't really care he* or when the copies in 
the storage hierarchy are updated. Store behind takes 
advantage of this dajree of freedom. Due to the large 
iisparity between average access time and transfer rate for 
most levels, the maximum data transfer capacity is rarely 
raachad (i.e., at any instant of time, a storage level may 
uDt have any outstanding requests for service or it may be 
waiting for proper positioning to service a pending 
raguast) . During these "idle" periods, data can be 
transferred down to tha next level of the storage hierarchy 
without affecting or delaying any fetch operation. Since 
tftesa "idle" periods are usually very frequent under most 
actual circumstances, there can be a continual flow of 
cnaagad information down through the hierarchy towards level 
L. 

4.6 Automatic Management 

Although an effective storage management system should 
attempt to minimize page movement and its associated 
"housekeeping", there will still be a substantial amount of 
worK required to manage the hierarchy. It is desirable to 
ramova as much as possible of the storage management from 
tie concern of the processor and the programs running on the 
processor, including the operating system. There are two 
pcinary motivations for this objective: (1) the storage 
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hierarchy should function as an independent component of the 
systai to eliminate any added complexity to the processor or 
programs, and (2) »a want to conserve the processor's 
computational powers for solving tne user's problems ratner 
taan tor "system overhead". In actuality, of course, the 
storage hierarchy can not be divorced entirely from the rest 
of the system, but the remaining interdepeudencies should be 
minimal. 

4.6.1 Distributed Control 

In the hierarchical storage system described above, all 
storage management operations can be determined local to a 
single level or, at most, in consideration of information 
rrom neighboring levels. Thus, it is possible to distribute 
tae control of the Hierarchy into the levels, this also 
facilitates parallel and asynchronous operation in the 
hierarchy. 

In a comprehensive multiple level storage hierarchy, as 
illustrated in Table 1, this automatic and distributed 
control can be accomplished by using two mecnanisms: (1) 
processor functions, ind (2) "intelligent" controllers. 
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4.6.1.1 Processor Functions 

The management of the first storage level must operate 
at speeds comparable to the processor. As a result, it is 
usually nacessary to incorporate the first level store and 
its associated management operations into the processor 
hardware itself. This approach is used in the IBM 
System/370 cache systeis [52]. 

It is often desirable to incorporate the management of 
the second storage lavel also xnto the processor. This 
lavel requires substantial performance to handle the demands 
for service from the first storage level. Since its 
requirements are not guite as demanding as the first level, 
it is an ideal candidate for firmware control, assuming that 
tne processor is microprogrammed. This approach has uot been 
used in any current commercial systems, although the 
integrated (i.e., microprogrammed) channels of certain 
models of the IBM System/370 are based upon similar 
concepts. There have been a few experimental systems, such 
as the VENUS System at MITRE, which provides processor 
functions to essentually manage the paying system via 
microprogram niing . 
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4.6.1.2 "Intelligent" Controllers 

For the third storage level and beyond, the storage 
management performance requirements are much more modest 
since most of the storage activity should occur at the first 
and sacond levels. For these lower levels, it is possible 
to develop independent storage management control facilities 
for each level. This can be accomplished by extending the 
runctionality of couventional device controllers. Some 
recent sophisticated device controllers are microprogrammed 
and ace already capaole of performing the storage management 
function [ 1 ]. 

4.6,2 Multiprogramming 

Jp to now we have tacitly assumed that the processor 
bscomas idle whenever it is necessary to fetch information 
from the storage hierarchy. This may be a reasonable policy 
for two-level cache systems since the processor is never 
idle for more than one or two microseconds at a time. But, 
for paging systems and general multiple level storage 
hierarchies, the processor may be idled for periods of 
hundreds or thousands of microseconds at a time. It is 
worthwhile to try to rind useful worJc for the processor 
while the storage hierarchy is retrieving the requested 
information. 
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In most conventional computet systems, processor idle 
time is utilized by multiprogramming. This requires that 
taera be multiple programs available to be run. Whenever 
one program must be delayed due to a time-consuming storage 
request, the processor is switched to another program. 
Under reasonable circumstances (e.g., many programs ready 
for execution and moderate load on the storage system) , it 
is possible to keep the processor ccntinuaiiy busy. Thus, 
tfte effective system storage access time, I» , will very 
closely approximate T l . 

Unfortunately, the process cf switching execution from 
one program to another can result in a considerable amount 
of processor overhead. For example, an early version of tne 
Multios operating system was reported to require 10 
milliseconds to switcti programs; typical operating systems 
require up to 1 millisecond. The time reguired to 
accomplish this mult lprogram switch can be drastically 
reduced if the multiprogramming management is also 
incorporated into the processor along with the rirst and 
seconl storage level management. Although the particular 
purposes were different, hardware supported multiprogramming 
aas been available on several computing systems, such as tJie 
Honey,*^!! 830 series [46] and more recently in tne Singer 
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System Tan [30 J. The less frequently executed operating 
system functions, such as job scheduling and time-sharing 
management algorithms, can be supported by the software 
operating system as on conventioal systems without adversely 
affecting performance. 

4.7 Comments on the Storage Hierarchy System Resign 

This chapter has presented the key concepts or a 
general multiple level storage hierarchy system. Many of 
the particular details of the system will require 
considerable investigation and experimentation to determine 
an optimal implementation. Three important factors are 
extensively studied in the following chapters: (1) other 
page size considerations, (2) removal algorithms, ana (J) 
relevant models for program reference behavior. 
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CHAPTER 5. 
ANALYSIS OF PAG2 SIZE CONSIDERATIONS 

5 • G latcoduc t ion 

One of the most important parameters or a storage 
arerarchy system is the page size, the unit of information 
transfer between two levels of the hierarchy. In this 
chapter, the factors influencing page size are examined from 
the iavica charactaristics viewpoint and the program 
Dahavior viewpoint. 

5* 1 JLhe Pag_e Size Issue 



ii contemporary two-level paging systems (based upon 
two davices similar to devices 2 and 4 of Table 1) , the page 
siza is usually guite large (typically 4096 bytes for paging 
systams) to take advantage of « 2, s large transfer rate to 
compensate for xts slow access time. Such a large page size 
is justified by reliance on the principle of Locality, 
-onsidering the devices of Table 1 for example, a single 
byta -an be accessed and transferred between ll! 1 and H 2 in 
about 5 milliseconds whereas 4096 contiguous bytes can be 
fatchad in 7.8 millisaconds, only 56% more time. 
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5, 1. 1 Page Size Investigations 

Although paging systems have been used successfully, 
tae affect of page si^ti aas becoae the subject o± increasing 
investigation. This interest has been aroused due to several 
considerations: 

1. It has bean noted by Denning [26] that the 
utilization of M l is maximized and "page breakage" minimized 
by using rather small pages, such as 200 bytes. In 
particular, he emphasizes: 



•These results are significant ... small pages 
permit a great daal of compression without loss of 
efficiency. Small page sizes will yield significant 
improvements in storage utilization ..." 



2. The success of cache systems indicates tnat the 
Principle of Locality applies on the microscopic scale as 
wall as the macroscopic scale cf conventional paging 
systems. 

3. The recent introduction of several new device 
tachnologies, such as the "semiconductor drum" [35] with an 
average access time of about 100 microseconds, drastically 
raduces the benefits of very large page sizes in a paging 
system . 

4. Although most current multilevel systems employ 
only two levels, this tnesis is concerned with multiple 
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lcjvel storage hierarchies (i.e., three cf more levels). In 
fact, storage systems with six or more levels are guite 
plausible. A deep uniarstanding ot the effects of various 
page sizes is essential to the development of such systems. 

Thus, although there are many reasons for considering 
new page sizes, there is not a complete understanding of the 
impact of such a change. Denning [26] sums up our current 
knowledge as follows: 



"Two factors primarily influence the choice of page 
size: fragmentation and efficiency of page-transport 
operation. '• 



In this chapter some other factors of potentxally crucxal 
importance will be discussed, 

5. 2 Anomalies 

3ne of the more intriguing and frustrating aspects of 
complex systems, such as paging systems, is the occurrence 
of anomalies (i.e., phenomena that are contrary to "common 
sense"). For example, Belady [ 10] has shown that certain 
storaje management reaoval algorithms, in particular FIFO 
(tirst-in first-out), may actually cause performance to 
decrease as the capacity of M l is increased. This result is 
contrary to the general belief that "more main memory makes 
things work out batter". Thus, one must exercise 
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considerable care when considering "tinkering" with the 
paramaters, such as page size, of a multilevel storage 
system . 

The objective of this chapter is to present and analyze 
some anomalies encountered when the page size parameter is 
changed in a paging system. 

5 « 3 £!l£ £aa§ Size Anomaly. 

For simplicity, let us start by considering the effect 
of decreasing the page size used in a two- level system, S, 
from N to N» where N • = N/2 in this new system, S 1 . In 
particular, we wish to investigate the effects upcn the 
failure frequencies which are f and f*, respectively. rfe 
define the ratio f • /£ to be r. The possible results can be 
partitioned into three interesting regions: 

1. r < 1. 

2. 1 < r < 2. 
J. r > 2. 



5. 3. 1 Case 1 : r < 1 ( r« < t ) . 

This would be a highly desirable result since tne 
numbac of page fetches is actually decreased. Furthermore, 
the time required to access and transfer a page of size N* 
would be expected to be less than that required fcr the 
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(f4) 



£i£Sa.iters 
As sesn by S: 



P = a, b, z , a, b, c 
1 P I = 6 

0. = I a, b, c } 
IQf = 3 
|MM = 2 
FIFO aeraoval 



As seen by S* : 



P = a*, b*, c + , a + , b* # c + 

|P| = 6 

Q = ( a+, b+, c* } 

IQI = 3 

| MM = 4 

FIFO Removal 



Sijaulation 

Paga Trace: a* b* c + a* b+ c + 

_S_ 

Fatch: ****** 
A 1 Jon tents; a b c a b c 

a b c a b 

_§!_ 

Fatch: * * * 

H 1 Contents: a* b + c* c* c + c+ 

a* b* b* b* b* 
a* a* a* a* 



riasults 



• F 

• F» 

• r 



3 
3/6 



= 0.5 



Figure 6. 
Example cf Case 1 
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larger page size N. Figure 6 illustrates an instance of this 
case. In converting au address trace to a page trace tor N', 
the logical page addresses p + and p- are used to represent 
the two halves of the page p of size H. Note that when using 
a page size of N/2 instead of H, Mi actually holds twice as 
aany pages though eaca page is only half as large. 

In the example of Figure 6, r = 0.5, which means that 
the number of page fetches was cut in half by using the 
siallar page size N». This type of result might be expected 
from a program that exhibited a rather sparse and 
aon-lDcalized reference behavior. Recall that in typical 
two-level paging systems, a page of size 4096 bytes is 
fetched even though a single reference uses only a few 
bytes. Unless the program im mediately makes many more 
rsferances to this page, much of it will have been fetched 
out not used. Under these circumstances, H >■ might ne better 
utilized by holding a larger and more diversified collection 
of pages, even if each page were smaller. 

5.3.2 Case 2: 1 < r < 2 { f < f» < 2f ) 

This is a transitional region. lor r = 1, S« will 
perform better than S since the number of page fetches is 
the same and the time reguired for each fetch is less. For r 
= 2, 3' will regune twice as many page fetches. This will 
usually swamp any paje transfer benefit derived from the 
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smaller page size, thus S would perform better. The specific 
point of transition, r 1 , depends largely upon the time 
required to access and transfer a page, T and I' 
respectively in S and 3«, such That r' = T/T'. 

Figure 7 illustrates an extreme example of Case 2 wnere 
r = 2.G. This means that the number of page fetches was 
iouDlad by using tne smaller page size N'. This type of 
resalt might be expected from a program that exhibited a 
dense, localized, and sequential reference behavior. 

Intuitively, the r = 2.0 result is the "worst" case 
since we are being forced to always load both the p + and p - 
iialves of each original page p, thereby losing all the 
benefits of the smaller N' page size and incurring twice as 
many actual page faults. This intuitive observation is 
false; r = 2.0 is not the "worst" case, 

'). 3. 3 Case 3: r > 2 { f » > 2f ) 

This third region, besides being intuitively 
impossible, is clearly undesirable. Since the number of page 
fetches required would be more than doubled, the performance 
of S* would be undoubtedly worse than S. Depending upon the 
actual value of r, ttie perrormance could be much worse. 
Figure 8 illustrates a reference pattern that produces a 
result of r = 2.75. This region of operation will be the 
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Figure 7. 
Example of Case 2 
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subject of iiscussxon tor the remainder or this chapter, ne 
formalize this situation by the following existence theorem. 

(th 1) 
THEOHBH 1: 

There exists a page trace, P, and aemand-fetch 

FIFO-removal two-level storage systems, S and S', with 

page sizes N ani N«=N/2, respectively, such that the 

ratio, r, of fetch frequency f to f exceeds 2. 

Proof: 

By example (Figura d) . 



5.3.4 Other Removal Algorith 



ms 



Theorem 1 states the anomaly that decreasing page size 
Dy a factor of two can cause the page fetch freguency to 
increase by more than a factor of two. The two-level 
damani-fetch conditions of Theorem 1 are typical cf most 
contemporary paging systems. But, to put this situation into 
parspactive, other removal algorithms must be considered. 
Due to its simplicity, tne FIFO removal algorithm was used 
in many of the early paging systems. In recent times it has 
baen found tnat FIFJ has certain disturbing pecularities 
(e.g., the system's success frequency, s, is not a monotonic 
function of primary store size, J M 1 1 [10J). Furthermore, 
other removal aigoritasis have been fcund to be empirically 
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closer approximations to the "optimal" removal algorithm, 
NIN [11]. HIH itself is not physically realizable since it 
requires future knowledge, but it can be used as a basis for 
performance comparison with practical algorithms. 

Various forms of the "least recently used" (LfiU) 
removal algorithm have become popular in contemporary 
systess. Under LHU, the page selected tor removal from the 
primacy store is the one that has net been referenced for 
the longest time (i.e., the least recently used page). 
Empirically, LRU has been found to closely approximate the 
pertormance of the "optimal" algorithm for many actual 
projrams. Furthermore, Mattson et ai [63] have studied LRU 
and found that it is a member of a general class of removal 
algontnms called "stack algorithms". The class of stac* 
algorithms, as noted by Denning [25], "contains all the 
•reasonable* algorithms". In particular, stack algorithms 
all satisfy an inclusion property that results in well 
D3hav3d characteristics. For example, it has been proven 
that all stack algorithms, including LRU, have a success 
frequency that is a monotonic function cf primary store size 
aad immune to the FIFO peculariarity observed by Belady. 
Thus, one might be tempted to assume that the page size 
anomaly is also a phenomenon unique to FIFO removal and 
waul! not occur if a "well behaved" removal algorithm, such 
13 LHJ, were used. This expectation can be rapidly destroyed 
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by observing Figure 9, wnich is the same system as figure d 
but with an LRU removil algorithm. In this example, the page 
fetch frequency ratio, r, is 2.2 which still exceeds 2. This 
result leads us to Theorem 2 and Corollary 2a. 

(th2) 
THEOREM 2: 

Thera exists a page trace, P, and demand-fetch 

LSU-reinoval two-level storage systems, 3 and s', mth 

page sizes N and N«=N/2, respectively, such that tbe 

ratio, r, of fet^h frequency f to f exceeds 2. 

Proof: 

By example (Figure 9) . 

C3R0LLAHY 2a: 

3iven a page trace, P, and demand-fetch two-level 
storage systems, S and S', with page sizes N and 
N«=N/2, respectively, the use of a "stacx" removal 
algorithm (i.e., an algorithm with the "inclusion 
property") is not sufficient to guarantee tnat the 
ratio, r, of fetch frequency f» to f will be bounded by 
2. 
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Figure 9. 
Example ot Case 3 
(for LRU Removal) 
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5.4 Significance or the Pacje Size Anomaly 

rue previous theorems prove that there exist paye 
traces tnat result in significantly increased page letch 
frequencies if the page size is decreased. It is necessary 
to consider the liitelihood of encountering such page trace 
patterns in actual programs. For example, it can De proven 
that, as you are reading this sentence, all the molecules of 
air in the room may suddenly move towards the opposite 
corner and cause you to suffocate. If you survived the last 
senteice, you have probably deduced that the likelihood o£ 
taat event is extremely small, fortunately. 

5.4.1 Simulation Studies 

Hatfield [48 J and Seligaan [78] have performed 
experiments that indicate that the page size anomaly is very 
common, if not inevitable, in actual programs. In both cases 
actual programs were monitored and their corresponding page 
trace reference strings were recorded, usually on magnetic 
tape. Then simulators were developed that aimicxea the 
software and hardware of the two-level storage systems then 
in use or being consiierei. By supplying the monitored page 
traces as inputs to the simulators, the performance cf such 
a system can be accurately measured. These simulators were 
scrupulously accurate, not just approximations. The validity 
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of these results have been confirmed in some cases oy 
tinning the real programs under a real two-level storage 

system. 

5. 4. 2 Hatfield Studies 

Hatfield [^^3 performed studies in the hardware 
environment of the IBM System/360 Kodel 67 with programs 
canning under the CP-67/CMS Operating System. The simulated 
performance was measured for various page sizes, N, and 
various primary store sizes, |M*J. In summary, it was 
confirmed tnat certain programs, which were viewed as 
examples of low-density storage use, resulted in decreased 
page retch frequency when page size was decreased. But, it 
was observed that for programs with much greater 
localization of heavily used storage: 



"not only does tha smaller page size often generate 
naarly twice as many page fetches as the large page 
size, it often resulted in more than twice the page 
fetches, contrary to our intuitions. " 



In particular, the substantially increased page fetch 
irequency appears to be: 



"a characteristic; of programs which have a high 
locality and therefore perform well on systems using 
relocation hardware for address translation and is 
characteristic of those programs in the region of 
low paging rate." 
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la other words, the anomaly is most prevalent in programs 
"aptiinized" for performance in a two-level storage system 
when running under nearly "optimal" conditions! 

5, 4. 3 Seligman Studies 

Mhereas Hatfield was concerned with a paging system 
with page sizes in tne range from 2C48 to 16384 bytes, 
Saligman [18] analyzad a proposed cache system with much 
smaller page sizes in the range of 8 to 256 bytes. He 
observed that: 



"interestingly, the missing page probability (for 
this data) is minimized for a page size which 
increases slowly with total memory size. Note that 
the associative msmory organization, where page size 
eguals one word, is not optimum; tc borrow a phrase 
from economics, the marginal utility of the extra 
words fetched in a page is higher than that of those 
displaced" . 



Thus, continual decreasing of page size appears to have an 
inevitable adverse effect upon system performance. 

5.4.4 Other Questions Haised 

Now that it has been shown that the page size anomaly 
is theoretically possible and likely to occur in practice, 
there are several other guestions of interest. Since it has 
been proven that the page fetch frequency ratio is not 



Storage Hierarchy Systems 102 

bounded by r = 2, what bounds, if any, do exist? Hatfield 
iiplicitly raised another question by the statement: 



"as yet we have bean unable to prove that there is a 
replacement algorithm using cnly the past history of 
page requests wnich cannot generate more than twice 
the exceptions with half size pages." 



Tae answers to these questions are the subjects of the 
following sections and chapters. 

o. 5 Bounds on the Page Fetch Frequency fcatio 

It has been shown that the page fetch frequency ratio 
can exceed r = 2, but just how bad can it get? Of equal 
mpoctance, what factors influence this bound? These 
questions will be discussed in this section. 

5.5.1 Cyclic Page Traces 

Figures 10 and 11 represent page trace sinaulaticns for 
trfo sets of demand-fetch LRU-removal two-level storage 
systems with primary store sizes JM*|=2 and 1M 1 |=3, 
respectively. In botti cases, it can be observed that the 
t>age trace simulated is cyclic with a repeated pattern, Pc. 
la Figure 10, tne page trace consists ot the repeated 
pattern: 

Pc = a* b+ c + c- b- a- 
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As sean by 3: 
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Figure 10. 
Cyclic Page Trace with |M*j =2 
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whereas Figure 11 repeats the similar pattern: 
Pc = a* b*- c+ d+ d~ c~ b~ a~ 

5.5.2 Steady Stata Cyclic Page Traces 

Let us consider Figure 10 first. The page fetch ratio, 
r, is 2.0 in this casa. ks noted earlier, the page trace can 
ba suoiivided into aa initial transient stage, Pt, with a 
high page fetch frequency followed by a steady-state stage, 
Ps, with usually a lower page fetch frequency. In Figure 10, 
the first Pc cycle contains the entire start-up transient 
stage and completely fills all the available space in B>, 
Thus, tha second Pc cycle represents the start of the 
steady-state stage. Furthermore, since the content and page 
ordering of a* is exactly the same at the end of the second 
cycle as they were at the beginning of that cycle for both S 
and 5', the page trace cycle, Pc, can be repeated 
continuously with exactly the same results each time for 
page fetch requests anl M l contents. If /r/ is defined to be 
ttie page fetch frequency ratio for the first steady-state 
parioi, Pc, of a cycLic page trace, (Pc) *, /r/ is also the 
page fetch frequency ratio for the entire steady-state 
portion of tha paga trace defined by the regular expression: 

P = Pt»Ps = Pt» (Pc)* 
A3 the length of the page trace, |P|, becomes large in 
comparison with the length of the transient stage, |Ptj, the 
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overail page fetch frequency ratio, r, asymptotically 
approaches the value of the steady-state cycle page fetch 
frequency ratio, /r/. In Figure 10, /r/ =3.0, thus r will 
increase from 2.0 towards 3.0 as the page trace is 
lengthened by continually repeating the pattern Pc . inus, 
ttie page fetch frequency ratio, r, for the page trace 

P = ( a* b+ c+ c- c- a- ) * 
is bounded by 3.0 when |tt*| = 2. 

A similar situation is illustrated in Figure 11. In 
tais example, r = 2.23 and /r/ = 4.0. Thus, the page fetch 
frequency ratio, r, for the page trace 

P = ( a* b* c* d* d- c~ b~ a- ) * 
is bounded by 4.0 whan |M»| = 3. By generalizing these 
examples, we arrive at Theorem 3 and Corollary 3a. 

(th3) 
IHEOrtEH 3: 

For any two demand-fetch LRU-removal two-level storage 

systems, S and 3«, with page sizes N and N«=N/2 and 

primary store sizes J H* | and | H*| « = 2|fl» | # respectively, 

there exists a cyclic page trace, E = (Pc) *, where |Pc| 

= 2(|H*|«-1), such that the steady-state page fetcn 

frequency ratio, /r/, equals |fH| + 1. 

Proof: 

(See below) . 
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As seen by S: 



(f9) 



P = a, b,c,i, d,c,b,a,a,b,c # d # d,c,b,a 
I PI = 16 
Q = ( a, b, z, d } 

iai = a 

|MM = 3 

LRU Removal 



As seen by 5 * : 



P = a*,b+ r c: + ,d*,d-,c- ,b~, a- r a+ , b* ,c+ ,d* ,d~ ,c~ ,b~ ,a~ 
|P| = 16 

Q = ( a*, a-, b+, b~ , c+ , c~, d+, d~ } 
I Q t = 8 

IBM = 6 
LRU Removal 



Simulation 
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tents: a* b* c + i* d~ c~ b~ a- a* 
a* b+ c* d+ d- c- b~ a~ 
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b+ 

a* 
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b 
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* 
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a+ b* c* &+ d- c- b- a- 

a+ b+ c* d* d- c~ b- 

a + b+ c + d* d- c- 



****** 
c+ d+ d - c - t- a- 
b* c* d+ d- c- b- 
a+ b+ c+ d+ d- c- 
a- a+ b* c+ d+ d~ 
b- a- a* b* c+ d+ 



a* b+ c+ 

L 



d+ d- c- b~ a- a* 



same 



b+ c+ 

_i 



suits 



F 

F' 

r 



7 

16 

16/7 = Z.2H 



For the steady-state cycle: 

F = 2 
F' = 8 
• /r/ = 6/2 =4.0 



Figuro 11, 
Cyclic Page Xtaco with |M l | = 3 
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:3HOLLAH)f 3a: 

For any two demand-fetch LRU-removal tuo-level storage 
systems, S and 5', witn page sizes N and N*=N/2 and 
primary store sizes |Mi| and |M»|'=2lli 1 |» respectively, 
there exists a cyclic page trace, E = (Pc) *, where |Pc| 
= 2(|M A | + 1), such that the overall page tetch frequency 
catio, r, asymptotically approaches the bound |M»|+1 as 
|P| approaches infinity. 



5.5.3 Proof of Theorem 3 

5.5.3.1 Notation and Properties 
Assume a fixed page size N and primary store of size S 1 , let 
a = tae number of pajes in M* (i.e., n = |M*| = S^/ii). It 
has baen shown by Mattson et al [63] that a demand-ietcn 
LtJU-removal algorithm has the following properties: 

P1. If M* is initially empty, it fills with the first 

n distinct pages referenced by the trace. 
P2. At any tima t, M 1 contains the n most recently 

referenced distinct pages. 
P3. a> LRU satisfies the inclusion prop ert y 

H* (1) C 8»(2) C, C M l (m) 
where M l (1) means the contents of M 1 if n=1, 
etc. 
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b) At any time t after M 1 has become filled, there 
is a strict removal ordering referred to as the 
LRU stack 

S = { s(1) , s(2) , ..., s(n) } 
where 

s (i) = M»(i) - MMi-1) for i = 1, 2, ..., n 
and s(n) is the page to be removed next. 



5.5.3.2 Definition 3-a: 

For any integer n, let us consider a page trace, P°, 
consisting of the repeated pattern, Pc°, of length |Pc°l = 
2{n+1) 

po - p c °[n]* 
where 

P=o; n ] = { Pc°(1), Pc°(2), ..., Pc°(2n*1), Pc°(2n*2) }. 
The P^°(i)s are defined as follows: 

12 (x-1) for i = 1, . . . , n+1 

<*n + 5-2r for i = n+2, ..., 2n + 2 

Thus, for n = 2 -- 

Pc°[2] = i 0, 2, H, 5, 3, 1 j 
a n d 
P°[2J = { 0, 2, 4, 5, 3, 1, 0, 2, 4, 5, 3, 1, ... } 
me cyclic page trace pattern, Pc°[n], is used to define 
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^orcespoudin g cyclic page trace patterns, Pc[ n ] and Pc'^nJ, 
far S and S 1 , respectively. Thase are defined as follows -- 
For a given value of n and i = 1, 2, . .., 2n*2 
Pc(i) = iateger[P = °(i) /2] 



Pc 



!(iategei:[Pc°(i)/2]) + if re«LPc° (i) /2 ]=0 

(integer[Pc° (i)/2])~ if iei[ Pc° (i) /2 ]=1 



Thus, for n = 2 — 

P[2J =£0,1, 2, 2, 1, 0, 0, 1, 2, 2, 1, 0, ... } 
P'L2] = { 0+, ^+, 2*, 2~, 1", 0", 0*, 1+, 2*, 2~, 1", 0", 
... } 

We caa see that these page traces are identical to the page 
traces of Figure 8 with appropriate relabeling (i.e., a=0, 
b=1, c = 2). 



5.5.3.3 Leaia 3-b: 

The page references of the set 

{ Pc(1) , .. ., Pc(n+1) J 
are distinct. 
Proof: 

Based upon the definitions cf Pc°[n] and Pc[n], ve see 
that 

For i = 1, ...» n+ 1 
Pc(i) = integer^ Pc° (i)/2J 
= integer[2 (i-1)/2 ] 
= integer£i- 1 ] 
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= i-1. 
Thus, each value of P-(i) for 1 = 1, ..., n+1 is distinct. 

g.E.D. 



5.5.3.4 Lemma 3-c: 

The page references of the set 

( ?z (n+2) , . . ., Pc(2n + 2) } 
are distinct. 
Proof: 

Based upon the definitions cf Pc°[ n J and Pc[n], we see 
that 

For i = n+2, ...» 2n + 2 
Pc(i) = integer[Pc°(i)/2] 

= integer [ (4n + 5- 2i)/2 ] 
= integer[2n+2+ (1/2) -i] 
= 2n*2-i 
Thus, each value of Pc(i) for i = n+i, ..., 2n + 2 is 
distinct . 



5.5.3.5 Leama 3-1: 

At the end of aach cycle, Pc( n ], of the page trace, 
P[n] f Mi contains the pages, in LRU stack order, 
S° = ( sO(1) , ..., S o(n) } 
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where 

s° ( J) = J" 1 toe j = 1, . . . , ft 

Proof: 

Since each cycle, Pc[n], cf P[ n ] is of length 2n + 2 
waich is greater that n, the S° LRU stack consists or. tne 
last n page references of Pc[n] in reverse order by property 
P2, P3, and Leaiaa 3-c. Thus, 

s°(j) = Pc(2n+3-j) 
such that 

3°(1) = Pc(2n + 2), s°{2) = Pc(2n + 1), ..., s°(a) = Pc (n + 3) . 
tfhen j takes on values ( 1, ..., n }, 2n+3-j takes on values 
{ 2n*2, • ••, n + 3 }. Thus, for j = 1 , . . . , n and eased upon 
Laama 3-c: 

s°(j) = Pc(2n+3-j) 

= 2n+2- (2n+3-j) 

Q. E.D. 



5.5.3.6 Leaaa 3-a: 

Siven a deaand-fetch LRU-removal two-level storage 
systea, S, with page size N, primary store size S 1 
containing n=S 1 /N pages, the page fetch function, F, 
resulting from aach steady-state cycle, Pc[u], of the 
page trace P uas the value 2 (i.e., Fi.Pc£n]]=2 during 
steady state) . 
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Proof : 

Let us subdivida the Pc[n] cycle, which is of length 
2a+2, into four regions as follows: 

Region 1: Pc* = { Pc{1), ..., Pc (n) } 

Region 2: Pc* = { Pc{n+1) J 

Region 3: Pc* = { Pc(n*2), ..., Pc(2n+1) } 

Region 4: Pc* = { Pc(2n+2) J. 
and compute the numbar of page fetches in each region, F l , 
F 2 , F 3 , F 4 , respectively. Since the page trace regions are 
concatenated, the page fetches are cumulative, so we know 
taat 

F = pi ♦ f2 + F 3 + p*. 

Rajion 1: pci = { Pc(1), ..., Pc(n) j 

From Lemma 3-b, *e know that 

Pc{i) = i-1 i = 1, ... # n +1 

and from Lemma 3-d, wa know that at the beginning of each 
cycle 

s°(j) = j-1 j = 1 , ... , n. 

l'tie page references { Pc(1), ..., Pc (n) } are actually the 
saquance {0, ,.., n-1 } which is identical to the contents 
of M l at the start of the cycle, S°. Therefore, no page 
transfers are reguiral although LRU stack reordering may 
o^cur. (FifO) . 
Hagioa 2: Pc* = { Pc(nM) j 

Page reference P^(n+1) is page n which is net contained 
in S° nor loaded during region 1 (in fact, no pages were 
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fetched during region 1) ; thus, a page transfer is required 
(E£fl) • Using similar techniques as in Lemma 3-d, since each 
reference of Pc* is distinct, the LEU removal stac* at this 
paint is 

S = { s(1) , .. ., s(n) } 
where 

s(j) = Pc(n-H- j) j = 1, .. ., n. 
Page s<n) is selected for removal, this is actually page 
Pc (n*1-n)=Pc(1) =0. The new LHU stack ordering becomes 

s(j) = Pc(n + 2-j) j = 1, ..., n. 
Region 3: Pc* = {. Pc(n+2), ..., Pc(2n+1) } 

The page references [ Pc{n+2), ..., Pc(2n+1) J are 
actually the sequence ( n, ..., 1 } as snown in the proof of 
Lemma 3-b. The LHU stack ordering immediately prior to 
reference Pc(n*2) is 

S°° = [ s(1) , .... s(n) ) 
which is actually 

U. ... , 1 } 

since it has been shown earlier that at reference Pc(n+2) 

s(j) = Pc(n + 2-j) j = 1, ..., n. 
Thus, as in region 1, every page referenced is already 
contained in M l and there are no page transfers required 
(llzO) . 
Region 4: Pc^ = [ Pc(2n+2) ) 

Page reference Pc (2n+2) is actually page 0. This page 
was not contained in S°°, thus a page transfer is required 
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(llzl) • 

Therefore, we can conclude 

F[Pc[n]] = FHPc»] ♦ F*[Pc2] ♦ F 3 £Pc3J + F*[Pc* ] 

= 0+1 + 0*1 

= 2. 

Q. E.D. 



e 



5.5.3.7 Lemma 3-f; 

jiven a demand-fetch LfiU-removal two-level storage 
system, S«, with page size N«=N/2, primary store sxz 
I M l ] containing 2n=[ ft» ]/ (N/2) pages, the page fetch 
function, F*, resulting froa each stead y-stat€ cycle, 
Pc»[n], of the page trace P' has the value 2n+2 (i.e., 
F«[ Pc«[ n] ]=2n + 2 luring steady state). 

Proof: 

The proof follows directly from the definition of P 1 , 

tie LRU properties, and the previous Lemmas. 

• Each page reference in the cyclic pattern Pc»£nJ is 
distinct. (This can be easily seen from the definition or 
pcoireti in a similar manner to Lemmas 3-b and 3-c) . 

• Each cycle is 2n*-2. references lcng. 

• At any time t, paye reference P'(t) = P« (t-2n-2) . 

• The primary store, M*, can hold 2n pages in S« since 
M'=lJ/2. 
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• Since the cyclic pattern only repeats after 2n + 2 steps 
and M» is only 2n pages large, M* always holds the last 2n 
page references (since they are distinct) . 

• Thus, at any time t, page reference P*(t) will not 
correspond to any page currently in M * (i.e., il* iiolds 
references { P«(t-1), . .., p« (t-2n) J and P' (t) =e* (t-2n-2) 
is not in that set). As a result, a page fetch is required 
for every page reference. 

• Since there are 2n+2 page references per cycle, there 
ace 2n+2 page fetches required per cycle. Thus, F'=2n+2. 

Q.S.D. 



5.5.3.8 Theorem 3: 

For any two denaani-f etch L HO- removal two-level storage 
systems, S and S«, witn page sizes N and N»=N/2 and 
primary store sizes i « l J * =2j M * J , respectively, there 
exists a cyclic page trace, P=(Pc)*, where 
| Pc| = 2 ( JM l | +1) , such that the steady-state page retch 
frequency ratio, /r/, equals | (1 * | + 1 . 
Proof: 

This proof follows trivially from Lemmas 3-e and 3-1. 
we know that for each steady-state cycle of S, F = 2 (Lemma 
3-e). Also, for each steady-state cycle of S', F=2n+2 (Lemma 
3-f) . Since the page fetch frequency ratio, r, is defined as 
f'/f or (F'/l P|) /(F/|P|) which equals F'/F# we find that in 
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steady-state 

/r/ = F»/F = (2n+2)/2 = n+1. 

Q.E.D. 

5.5.4 Comments on Theorem 3 

The above results expose another facet of the page size 
anomaly. As the size of the primary store, M 1 , is increased, 
the overall page fetch frequency ratio as stated in 
Corollary 3a also increases. This means that the larger the 
primacy store that you have, the more "dangerous" the page 
size anomaly becomes. For example, in a two-level paging 
system based on devices 2 and 4 from Table 1, JH l J = 128 
pajes and N = 4096 bytes, if the page size is decreased by 
aalf to 2043 bytes, it is possible that tne page fetch 
frequency would increase 129-fold (a 12,800* increase in 
pagiay activity!). Of. course, one would assume, or at least 
iiope, that such pathological page trace patterns would be 
very rare, but we Know that they can exist. It is 
interesting to note that the pathological pattern shown 
above (e.g., a + b + c + c~ b~ a - ) corresponds to the expected 
references of nested subroutine calls (i.e., subroutine a 
calls subroutine b which calls subroutine c, etc., and eacti 
subroutine, of course, returns tc its caller) . This is also 
true of other stack-like program constructs. Such highly 
modular program design is quite typical and, furthermore, is 
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often explicitly encouraged. In view of Hatfield's finding 
where the overall r exceeded 2.0 in many programs, it is 
reasonable to assume that there were probably regions in 
wliich r was guite small, possibly below 1.0, which were 
counterbalanced by regions with very high values of r. At 
preseat we do not have this particular information 
available, but if it were true, performance could be greatly 
improved by eliminating the high r value regions. This 
problem will be discussed in the next section. 

5.5.5 Bounds for FIFO aemoval Algorithm 

Theorem 3 applies to LRU removal algorithms and many 
othar removal algorithms, although these other cases will 
not ba explicitly proven in this thesis. It is interesting 
to consider whether the result of Theorem 3 applies to the 
FIFO removal algorithm. Unfortunately, due to the 
peculiarities of FIFO, a simple generalizable cyclic page 
trace pattern has not been found. But, isolated examples 
have been found, as iLlustrated in Figure 12, that show that 
it is possible for r to exceed |Hi| + 1* This result is stated 
in Theorem 4. Based upon other examples, it is conjectured 
that the r, when FIFO removal is used, may be as high as 
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(£10) 

(Parameters 

As seen by S: 

P =a f c r a,b / b # c # c r a,a,b # b,c # c r a,a,b,b,c,c 
I P I =19 

U = [ a, b, c } 
I2I =3 
I Mi |=2 
PIFO Removal 

As sean by S ' : 

P =a t l c- # a-,b + ,b-,c t ,c-,a*,a-,b + ,b-,c + ,c- # a* # a-,l)+,b- ,c*,c- 
|P| =19 

Q ={ a*, a~, b+, b~, c+, c~ j 
I3I =6 
| Mi | =U 
FIFO Removal 
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Results 



F = 6 

F» = 19 

r = 19/6 = 3. 1b 



For the steady-state cycle: 

• F = 3 

• F« = 12 

• /r/ = 12/3 =4.0 



Fiyure 12. 
Cyclic Paga Trace with FIFO Removal 
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21 Mi | . 

(thf) 
rHEDSEM 4: 

For any two demand- fetch FIFO-removal two-level storage 

systems, S and S', witn page sizes N and N»=N/2 ana 

certain primary store sizes |M l | and |21 l | • = 2 j A 1 | , 

respectively, thare exists a cyclic page trace, P = 

Pt«(Pc)* where J Pc | = 2 ( | MM + 1) (| M l | ) , such that the 

page fetch frequency ratio, r, exceeds |M l | + 1> 

Proof: 

By example (Figure 12). 
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CHAPTER 6. 
SPATIAL VS. TEMPORAL LOCALITY HODEI OF PROGRAM BEHAVIOR 

6 . la troduc tion 

Early in this thesis it was explained that a major 
rationale for multilevel storage systems is based upon the 
Principle of Locality. Unfortunately, locality xs still a 
jjaoriy understood, or at least controversial, phenomenon. In 
this chapter some novel viewpoints and insights will be 
presented. 

6. 1 Tyges of Program Reference L oca lity 

Let us consider two extreme forms of program reference 
locality which will be called t emp oral locality and spatial 
locality: 

6.1.1 Temporal Locality 

If the logical addresses [ a», a 2 , ... } are referenced 
luring the time interval t-T to t, there is a high 
probability that these same logical addresses will be 
referenced during the time interval t to t+T. 
This behavior can be rationalized by program constructs 
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such as: loops, frequently used variables, and 
frequently used subroutines. 

6.1.2 Spatial Locality 

If the logical address a is referenced at time t, there 
is a high probability that a logical address in the 
range a-A to a + A will oa referenced at time t+1. 
This behavior can be rationalized by program constructs 
such as: sequential instruction sequencing, and linear 
lata structures (e.g., arrays). 

b.1.3 General Locality 

The definitions of temporal and spatial locality aoove 
are quite extreme. Usually we consider only the general 
spatiatemporal properties and define locality as: 
Locality 

If the logical addresses { a 1 , a 2 , ... } are referenced 

during the time interval t-T to t, there is a higii 

probability that the logical addresses in the ranges 

a*-A to a*+A r a 2 -A to a 2 +A, ..., will be referenced 

luring ttie time interval t to t + T. 

It is important to recognize that temporal locality and 

spatial locality are indeed the underlying phenomenon and 

that the "general locality" is merely a simplifying merging 

and blurring of these basic concepts. 
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& » 2 22222111 ional Removal Algorithms 

rfe cau begin to understand the factors causing the page 
size anomaly by stiiying now the various conventional 
removal algorithms handle temporal and spatial locality. In 
particular, we see, that whereas temporal locality policies 
ace given explicit attention, spatial locality policies are 
usually handled implicitly and subtlely. The "least recently 
used", LHU, removal algorithm, for example, is very much 
concerned aoout the temporal aspects of the program *s 
reference pattern. The spatial aspects are nandled as a 
by-product of the fact that the demand fetch algorithm must 
load an entire page (i.e., a spatial region) at a time and 
LRU removal decisions are based upon these pages. With these 
thoughts in mind, we can see that decreasing page size 
causes the conventional storage management algorithms to 
increase their sensitivity to temporal locality and decrease 
their sensitivity to spatial locality. Increasing page size, 
of course, results in the reverse effect. 

6 . 3 Locality in Act ual Programs 

ttany of the teshniyues for improving the locality 
behavior of programs, suca as the method or automatic 
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program restructuring by sectcr (subroutine) reordering 
described by Hatfield and Gerald [47], result in both 
iicreased temporal and spatial lccality. But, it seems that 
tiie reordering technique does, in fact, significantly favor 
spatial locality since it was noted [47] that: 



"the better orierings not only concentrate 
appropriate sectors into pages, but these pages also 
naturally clustec into larger units tnat satisfy 
nearness requirements on the page level - and 
cluster better than do the pages of the other 
orderings ... clustering sectors into pages also 
clusters pages into larger units." 



6.4 Local ity_ Mixes 



An effective multilevel storage management system must 
take ooth temporal and spatial locality into consideration. 
A3 we have seen from both Hatfield's and Seligmau's results, 
neglecting spatial locality can have disasterous results. 
Aay given program, or portion of a program's operation, can 
uave its reference locality characterized by the two-by-two 
matrix : 
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^aadrant 1, low-temporal and low-spatial locality, ii 
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definitely undesirable for operation in a multilevel storage 
system. There have bean numerous algorithms and programmer 
training techniques developed, as mentioned above, to 
minimize the number at programs with these poor locality 
characteristics. Quadrant 4, high-temporal and high-spatial 
locality, has traditionaly been the region of nest 
performance and is usually the objective of good program 
dasiga. Unfortunately, it is not always possible or 
convenient to design programs which attain both high 
temporal and high spatial locality; thus, we find many 
programs operating in quadrants 2 or 3, 

b.5 Spatial Locality, jkiaorithms 

Storage management techniques are needed which provide 
far more flexibility and robustness for balancing the 
systea's sensitivity to temporal and spatial locality. These 
algorithms must explicitly consider the spatial locality of 
a program. The tuple-coupling approach, described in the 
next chapter, is one such technique. It takes advantage of 
the temporal locality and compactness possible with small 
pages characterized by quadrant 2 behavior, yet it adjusts 
ti ttie spatial locality and clustering characterized by 
quadrant 3 behavior by simulating the removal policies 
associated with large pages. 
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6.6 ^Doaent on the Pa^a Size An caal y 

With this insight, we can now see that the page size 
anomaly is not really even a function strictly of page sizeJ 
lastead, it is an issue of locality, temporal versus 
spatial. 



Storage Hierarchy Systems 126 



CHAPTER 7. 
SPATIAL RBMOVAL STORAGE MANAGEHEMT ALGORITHMS 

7 . Introduction 

As stated earlier in this thesis and noted by Hatfield, 
a ceaoval algorithm that would limit the page fetch 
frequancy ratio, r, to 2 would be very desirable. In this 
section a technique, sailed the "tuple-coupling approach", 
is described which, when used in conjunction with 
conventional removal algorithms, such as LRU or FIFO, 
guarantees that r will not exceed 2. 

7 « 1 l!iElSzS21iEliaa Abroach 

rne basic concept behind the tuple-coupling approach is 
extreaely simple. First, the two portions, p* and p-, of 
each original larger page, p, oust be identifiable (i.e., 
tie set of pages o£ S» are viewed as a collection of 
2-tupies) . Second, the removal ordering policies must be 
applied to both elements of a tuple (i.e., the tuples are 
couplsd in regard to ordering decisions) such that a page p* 
or p- of S* is never removed unless the corresponding page p 
o£ s would also have been removed from M*. The particular 
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implementation of this approach may vary slightly depending 
upon the removal algorithm, e.g., LfiU, FIFO, etc., that is 
to be used. Any removal algcritaa to which the 
tuple-coupling approach can be incorporated is said to be 
11 tuple-couple- able". 

7. 1. 1 An Example of LHU Tuple-Coupling 

Figure 13 illustrates the application of the 
tuple-coupling approach to the LEU removal example 
previously shown in Figure 9. it should be noted that, in 
this case, r has indeed been limited to 2 although it had a 
value of 2.2 when normal LHU removal was used. The reader 
should carefully compare Figures 7 and 11 to understand how 
the tuple-coupling approach affects the removal algorithm. 
The M* contents are identical, of course, for S in both 
examples, but there are subtle differences in M l contents 
for S« . Eacn state of H» contents is marked, 1 to 11, in 
Figure 13 for referance purposes. Notice that in this 
implementation of tuple-coupling whenever both halves of a 
page, p+ and p~, are in M» f they are always adjacent in the 
fl 1 orlering; compare this with Figure 9. 

kt page trace step 3 we can see the first difference 
batrfean Figures 7 and 11. Page a~ is referenced and must be 
fetched in Figure 9, it is then placed at the top of the rt» 
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Figure 13. 
Example oi. LRU Removal with Tuple-Coupling 
(see Figure 9 for comparison) 
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ordering wnich becoias a-,b+,a+. On the other nand, in 
Figure 13 at step 3, it is noticed that a + was already in 
fl*. rhus, when a- is placed at the top of the M* ordering, 
a* is coupled to it resultiny in the ordering a~,a+,b*. At 
page trace step 7 of Figure 13 we see another interesting 
example of tne tuple-coupling approach. At the previous step 
the ordering was 

c- c+ b- b + 
when the reference to b+ is made, there is nc need to 
iaitiate a fetch since b + is already in M l . The M* ordering 
then becomes 

b+ b~ c~ c+ 
since LRU requires tnat the most recent reference move to 
tae top. Under this tuple-coupling scheme, b- is also moved 
toward the top of the ordering tc continue to be adjacent to 
b*. 



7.1.2 Implementation of the Tuple-Coupling Approach 

It is important to note that there are often various 
ways to implement tuple-coupling. In particular, in the LfiU 
tuple-coupling algorithm described above, the 2-tuples, 
waenever both portions were in H», were arranged to be 
aljacant in the M* csioval ordering. The requirement that 
uaithar portion, p* or p~, of a tuple in S* be removed 
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uolsss the corresponding page of S would have been removed 
can be accomplished in other ways. For example, the LRU 
removal stacit can be left in its normal ordering, as in 
Figure 9. In this case, when it is necessary to remove a 
page from S* the bottom page is not necessarily the correct 
chores to satisfy tuple-coupling. There is an algorithm 
which can scan the LKU stack and select the correct page for 
removal (in fact, it will select, of course, the same page 
selected by the algorithm illustrated in Figure 13). 

7.1.3 An Example of FIFO Tuple-Coupling 

It is interesting to consider the effect of 
tuple-coupling upon FIFO removal. Figure 14 illustrates the 
application of the tuple-coupling approach to the FIFO 
removal example previously shown in Figure 6. Once again, 
the page fetch frequency ratio, r, which originally was 2.75 
has indeed been limited to 2. The example of Figure 14 does 
not fully illustrate all the interesting aspects of 
tuple-coupling upon FIFO removal. In particular, if page p + , 
for example, is referenced in a page trace and it was not 
already in M 1 , it must be fetched. The M l contents are 
reordered as follows: 

1. If p - is not currently M» , p* is placed at the top 
of the FIFO ordering. 

2. If p~ is currently in H 1 , p + is placed immediately 
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before p- in the logical FIFO ordering 
p-'s relative ordering renains unchanged. 
rae reason for the second part of this rule can be seen from 
the normal FIFO ordering rule which places a page p at the 
tjp only if it were not already in M». If it were in M», it 
ramaxns at its previous ordering position. Under 
tuple-coupling, this rule applies jointly to the (p*»p") 
tiple as stated above. The reader is encouraged to work 
tarough the example of Figure 10 using the tuple-ccupling 
approach to illustrate this FIFO ordering phenomenom. The 
effect of the tupla-couplrug approach is summarized in 
Theorem 5. 

(th5) 

THEOKEM 5: 

For any two demand-fetch two-level storage systems, S 
and S«, with paga sizes N and N»=N/2, respectively, the 
use of the "tuple-coupling" approach for S' in 
conjunction with a removal algorithm that is 
'•tuple-couple-able' 1 is sufficient to guarantee that the 
page fetch freguancy ratio, r, cannot exceed the value 
2 for all possible page traces, P. 
Proof: 
( See below) . 
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7.1.4 Proof of Theorem 5 



As described earlier, when an adress trace, A, is 
appliad to storage systems S (with page size N) and S' (with 
page size N ,= N/2), it can be represented as page traces £ 
and £• , respectively. At time t*, let us consider a specific 
aldress reference, a, whose corresponding page references 
are p (in 3) and p + (in S 1 ). In processing this reference 
tuere are four possible fetch actions in S and S» depending 
upon the current content state of primary store, M 1 : 
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fiecall that the page fetch frequency ratio, r, eguals 
?'/F. In states 1 and 4 the same action (i.e., no page fetch 
ia 1 and a page fetch in 4) occurs in both S and S', the 
occurrence of these states cause r to tend towards 1. In 
state 3, a page fetch is required in S but not in S', this 
situation, if frequent, will cause r to decrease toward 
zero. This is usually the intended result of reducing page 
size. Only state 2, in which S 1 alone requires a page retch, 
contributes to an increase in r. Thus, we will concentrate 
our aaalysis on this particular situation. 
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since state 2 requires that page p be in H l at time t 1 , 
if we scan the address trace backwards, there Bust be some 
previous reference tiaa t 2 that caused page p (in S) to be 
tetchad into M l (this may have been the only previous 
reference to p or the page p may have been fetched and 
removed many times) . At time t 2 , there must also be a 
corresponding referenda to either p - and p + of S'. These two 
cases will be considered separately: 
Casa 1: £ = . . . p ...p 

P' = ... p - ... p* 

t = . .. t 2 . . . t» 
This case merely illustrates the fact that it can 
reguire two page fetches (for p + and p~) in S» to fetch the 
same amount of storage as page p in S. If this were the only 
case for state 2, r would never exceed 2. 
Case 2: P =...p ...p 

P' = ... p+ ... p+ 

t = . .. t 2 . . . ti 
In this case we see that subsequent to reference t 2 
page p of S and page p+ ou S' must be in M». Yet at time t* 
page p of S is still in Mi bat page p* of S* is not. Onder 
ttiese circumstances r can certainly exceed 2, merely making 
p- the next reference will account for 3 fetches in S 1 
compared to 1 fetch in S. Furthermore, it is possible that 
tie references between t 2 and t l could be repeated to 
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continually cause fetches for p + in S* without any 
corresponding fetches required in S. Thus, we see that this 
is precisely the situation that allows r to exceed 2. 

Under closer analysis, we see that this situation 
raguires that in S' p* be removed frcm M 1 between t 2 and t 1 
wheraas in S p remains in «». In other words, this general 
situation can only occur if at some time t, p + or p- of S' 
is selected for removal from a 1 and the corresponding page p 
of S is not also removed from ft 1 . But, the tuple-coupling 
algorithm (see page 125) is "such that a page p+ or p- of S' 
13 naver removed unless the corresponding page p of S would 
also have been removal from K l ". Thus, the tuple-coupling 
eliminates the possibility of case 2 and therefore 
guarantees that r cannot exceed 2. 

Q.E.D. 

7.2 Effectiveness of Tuple- Co up ling 

Clearly, the tuple-coupling approach has an influence 
upon tne overall at f ectiveness of the basic removal 
algorithm being used and the benefits of the smaller page 
siza. It is obvious that there are certain reference 
patterns (with r less than 2) for which tuple-coupling 
increases the value of r. On the other hand, it can be 
shown, as a simple exercise for the reader, that the example 
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of Figure o retains its low page fetch frequency ratio of 
J.S even when tuple-coupling is used. In fact, 
tupla-coupliug may oftan result in the "best of both worlds" 
oy placing a bound on the page fetch frequency ratio, r, for 
high r regions without interfering with the performance of 
originally low r regions. 

k program's reference behavior in S', during a short 
interval of its operation, may be characterized by three 
ragiois based upon the value of the page fetch frequency 
ratio, r, when tuple-coupling is not used: 

1. Sparse reference - small r (e.g., less than 1). 

2. Moderate reference - moderate r (e.g., between 1 
and 2) . 

3. Dense reference - high r (e.g., greater than 2). 

la tua sparse reference region, it is unlikely that both 
portions, p+ and p~, of a page, p, will be in M 1 
simultaneously; thus, the tuple-coupling will have minimal 
affect upon performance. In the dense reference region, we 
have already seen that tuple-coupling prevents extreme 
values of r. Based upon some recent, though limited, 
seasureraents, it appears that in the moderate reference 
re-jioti tuple-coupling performs about as well as the 
non- tuple-coupled algorithms. 
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CHAPTER 8. 
DISCUSSION AND CONCIUSIONS 

8.0 Introduction 

Efficient and effective storage management is important 
to the development of future computer systems. It has been 
astimited that the storage subsystems account for over 70* 
of the cost of most contemporary installations and, based 
upon present trends, this percentage is expected to 
increase. 

Much more research will be needed before all the 
problems of automatic storage management are understood and 
the obstacles to effective operation eliminated. This 
thesis has solved several open problems and has provided 
insight that should lead to the solution of many more 
problems. 

8. 1 Summary 

A detailed discussion of the many tacets ot storage 
management is presents! in Chapter 2. It also contains a 
general discussion of the reguirements which a system must 
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satisfy to be effective foe the user. 

In Chapters J and 4 a model for storage hierarchy 
systess is formalized and an implementation is proposed. The 
systea's design is based upon an orderly and uniform 
treatment of the storage levels. Specific techniques to 
improve performance, such as continuous hierarchy, shadow 
storage, direct transfer, read through, store behind, and 
automatic management, are explained. 

In Chapter 5 the "page size ancmaly" is presented (see 
also Hatfield [48]) : 



"The assumption about virtual memory systems that as 
overhead (time for access and software page 
management) decreases page size should be reduced is 
not always a good one. Kecent experiments indicate 
that larger sizes can provide better performance for 
programs that make highly localized use of memory 
space. " 



Tais phenomenon is formalized and a bound on the performance 

is proven. 

In Chapters 6 and 7 the concept of spatial locality is 
introduced and serves as the basis for a new storage removal 
algorithm called "tuple-coupling". These concepts are used 
tj explain the occurrence of the "page size anomaly" in 
actual systems. It is proven that the tuple-coupling 
approach is a sufficient strategy to avoid the occurrence of 
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the "page size anomaly" and it offers potential performance 
improvements for the storage hierarchy system. 

The techniques md theorems presented in this thesis 
provide a much more scientifically sound basis for examining 
and designing storage hierarchy systems than most current ad 
hoc approaches. Although there is still a long way to go, 
development of these formalisms is essential to the 
advancing of the "science" in Computer Science. 

8.2 Further jrfork 

There are many areas touched on by this woik in which 
questions remain. One of the aost signiticant is in the 
development and study of other possible "spatial locality" 
removal algorithms in addition tc the tuple-coupling 
approach studied in this thesis. This is an entirely wide 
open area. 

although tuple-coupling is studied extensively in this 
thesis, there are still many unanswered questions. Hew does 
tuple-coupling compare with the class of "stack" algorithms 
studied by flattson at al [ t)3 ]» in particular under what 
circumstances, if any, is tuple-coupling a stack algorithm? 
Likewise, how does tuple-coupling compare with the 
theoretically optimal replacement algorithm, called OPT £63] 
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or MIN [12 J? On a mora practical side, how efficiently can a 
tuple-coupling algorithm, or other spatial removal 
algorithms, be implemented? 

In order to ascertain specific procf of the utility and 
efficiency of general storage hierarchies, it will be 
nacessary to actually construct and measure the performance 
of such a system or, at least, perform more extensive 
simulation analysis. Furthermore, we must develop overall 
programming techniques and execution environments that are 
even more amenable to efficient operation in a storage 
hierarchy system. 

Many of these questions are currently under 
investigation, the results will be published later in a HIT 
Project MAC Technical Heport. 
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